If someone could port the Free Pascal string library to C, it would solve a lot ...

foobiekr · on July 15, 2024

For most strings, it seems to be that using a varint would solve the overhead problem. For short strings the overhead would be no longer than the null byte (which you could discard, except when interacting with existing APIs).

But as with _all_ string solutions, it's the POSIX interface, standard library, and other libraries that screw you. If you're programming in C today, it's because you're integrating with a ton of C code, and thus it's impossible to make progress since the scope of improvement is so small.

It's always struck me as weird that Rust treats strings the way it does - the capacity value is not useful for many cases of strings, and it would have cost them one bit to special case the handling constant strings without the cap measure, which would be better. Most strings are _short_ which makes the overhead worse, proportionally.

zokier · on July 15, 2024

Its not like there is any shortage of alternative string libraries for C; sometimes I feel everyone has gone and invented their own.

Antirezs sds is just one example https://github.com/antirez/sds

theamk · on July 15, 2024

Pascal strings have overhead of 2 ints per string (16 bytes on 64-bit systems)

The kind of person who calls a single pass through the string a "horribly inefficient solution" will faint at the idea of burdening every string with 16 more bytes of data.

adgjlsfhk1 · on July 15, 2024

it's pretty trivial to implement this as a max of 14 byte overhead (with small string optimization), but more importantly, it's only 16 bytes on 64 bit systems, which pretty much by definition aren't that memory constrained (since otherwise you would be on 32 but).

aidenn0 · on July 15, 2024

> ...it's only 16 bytes on 64 bit systems, which pretty much by definition aren't that memory constrained (since otherwise you would be on 32 but).

I'm not sure about that. There are plenty of 64-bit systems with less than 4GB of RAM

btown · on July 15, 2024

Including your laptop if you have a few Electron apps open!

aidenn0 · on July 15, 2024

Maybe other people's laptops, but 32GB is table-stakes for any laptop I'll buy.

pjmlp · on July 15, 2024

For some reason Multics got a higher security score from DoD than UNIX, guess why.

wruza · on July 15, 2024

This is a purely psychological problem. I’d say most of C is psychological, not technical. If I were a world dictator, one of my first orders was to lock C developers in a room with only python for few months. Or ruby, in severe cases. Some of them really need to touch grass.

mbivert · on July 15, 2024

> If I were a world dictator, one of my first orders was to lock C developers in a room with only python for few months. Or ruby, in severe cases.

I would additionally do the exact opposite: lock Python & Ruby developers in a room with only C for a few months.

C is a great language to learn programming, but Python or Ruby are, nowadays, in the most cases, better languages to program with. For example, C's sharpness is a notoriously famous source of bugs; yet it forces to develop rigor, discipline.

tmtvl · on July 16, 2024

But if you only give them a few months that's barely enough time to run a simple Hello World.

arethuza · on July 15, 2024

As an aside: can you have an O(0) operation that actually does anything?

mort96 · on July 15, 2024

It doesn't really make sense within the context of complexity analysis as something distinct from constant-time, which is denoted with O(1). A copy of a CoW string is O(1).

dhosek · on July 15, 2024

This. Pretty much with complexity analysis, you factor out any constants and only look at the term of complexity with the highest growth rate, so you end up with 1 < ln n < n < nª < 2^n (this can be extended indefinitely by replacing the n in the last case with anything to the right of n, but in practice, these are the only ones that matter.

carapace · on July 15, 2024

Stuff that happens at compile-time is O(0) (well, technically it's amortized over the number of times you run the compiled code, eh? Huh, how does JIT compilation affect Big-O analysis?)

Sohcahtoa82 · on July 15, 2024

O(0) is essentially meaningless. The only way a task could possibly be O(0) is if it isn't done at all, as even if the task is guaranteed to run in a Planck second [0], that's still constant time and would be O(1).

[0] https://simple.wikipedia.org/wiki/Planck_time

mikewarot · on July 15, 2024

It copies the pointer to the data and increments the reference count. When you modify a string it checks the count and copies it prior to modification if it's not 1.

calfuris · on July 16, 2024

Only if you have an operation that actually does something in precisely 0 time.

MarkSweep · on July 15, 2024

The Better String Library (aka batting, not to be confused with COM’s BSTR) is fairly nice:

https://bstring.sourceforge.net/

The string keeps track of the buffer size and how much has been used, allowing allocations to be somewhat amortized. The string buffer itself is zero-terminated for easy interop with code that expects standard C strings.

    struct tagbstring {
        int mlen;
        int slen;
        unsigned char * data;
    };

I used it on a microcontroller where I wanted something small and simple. The main missing feature is the lack of a small-string optimization like some implementations of std::string have. (Before anyone complains about this string type being too inefficient for a microcontroller, I had 1 MB of flash and 192KB of RAM, so I was not super constrained for resources)

bitwize · on July 15, 2024

Man, I want Ada.Strings.Fixed, Ada.Strings.Bounded, and Ada.Strings.Unbounded in C.