Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If someone could port the Free Pascal string library to C, it would solve a lot of problems with new C code. It reference counts and does all the management of strings. You never have to allocate or free them, and they can store gigabytes of text. You can delete from the middle of a string too!

They're counted, zero terminated, ASCII or Unicode, and magic as far as I'm concerned.

Oh... And a string copy is an O(1) operation as it only breaks the copy on modification.

Edit: correct to O(1), thanks mort96



For most strings, it seems to be that using a varint would solve the overhead problem. For short strings the overhead would be no longer than the null byte (which you could discard, except when interacting with existing APIs).

But as with _all_ string solutions, it's the POSIX interface, standard library, and other libraries that screw you. If you're programming in C today, it's because you're integrating with a ton of C code, and thus it's impossible to make progress since the scope of improvement is so small.

It's always struck me as weird that Rust treats strings the way it does - the capacity value is not useful for many cases of strings, and it would have cost them one bit to special case the handling constant strings without the cap measure, which would be better. Most strings are _short_ which makes the overhead worse, proportionally.


Its not like there is any shortage of alternative string libraries for C; sometimes I feel everyone has gone and invented their own.

Antirezs sds is just one example https://github.com/antirez/sds


Pascal strings have overhead of 2 ints per string (16 bytes on 64-bit systems)

The kind of person who calls a single pass through the string a "horribly inefficient solution" will faint at the idea of burdening every string with 16 more bytes of data.


it's pretty trivial to implement this as a max of 14 byte overhead (with small string optimization), but more importantly, it's only 16 bytes on 64 bit systems, which pretty much by definition aren't that memory constrained (since otherwise you would be on 32 but).


> ...it's only 16 bytes on 64 bit systems, which pretty much by definition aren't that memory constrained (since otherwise you would be on 32 but).

I'm not sure about that. There are plenty of 64-bit systems with less than 4GB of RAM


Including your laptop if you have a few Electron apps open!


Maybe other people's laptops, but 32GB is table-stakes for any laptop I'll buy.


For some reason Multics got a higher security score from DoD than UNIX, guess why.


This is a purely psychological problem. I’d say most of C is psychological, not technical. If I were a world dictator, one of my first orders was to lock C developers in a room with only python for few months. Or ruby, in severe cases. Some of them really need to touch grass.


> If I were a world dictator, one of my first orders was to lock C developers in a room with only python for few months. Or ruby, in severe cases.

I would additionally do the exact opposite: lock Python & Ruby developers in a room with only C for a few months.

C is a great language to learn programming, but Python or Ruby are, nowadays, in the most cases, better languages to program with. For example, C's sharpness is a notoriously famous source of bugs; yet it forces to develop rigor, discipline.


But if you only give them a few months that's barely enough time to run a simple Hello World.


As an aside: can you have an O(0) operation that actually does anything?


It doesn't really make sense within the context of complexity analysis as something distinct from constant-time, which is denoted with O(1). A copy of a CoW string is O(1).


This. Pretty much with complexity analysis, you factor out any constants and only look at the term of complexity with the highest growth rate, so you end up with 1 < ln n < n < nª < 2^n (this can be extended indefinitely by replacing the n in the last case with anything to the right of n, but in practice, these are the only ones that matter.


Stuff that happens at compile-time is O(0) (well, technically it's amortized over the number of times you run the compiled code, eh? Huh, how does JIT compilation affect Big-O analysis?)


O(0) is essentially meaningless. The only way a task could possibly be O(0) is if it isn't done at all, as even if the task is guaranteed to run in a Planck second [0], that's still constant time and would be O(1).

[0] https://simple.wikipedia.org/wiki/Planck_time


It copies the pointer to the data and increments the reference count. When you modify a string it checks the count and copies it prior to modification if it's not 1.


Only if you have an operation that actually does something in precisely 0 time.


The Better String Library (aka batting, not to be confused with COM’s BSTR) is fairly nice:

https://bstring.sourceforge.net/

The string keeps track of the buffer size and how much has been used, allowing allocations to be somewhat amortized. The string buffer itself is zero-terminated for easy interop with code that expects standard C strings.

    struct tagbstring {
        int mlen;
        int slen;
        unsigned char * data;
    };
I used it on a microcontroller where I wanted something small and simple. The main missing feature is the lack of a small-string optimization like some implementations of std::string have. (Before anyone complains about this string type being too inefficient for a microcontroller, I had 1 MB of flash and 192KB of RAM, so I was not super constrained for resources)


Man, I want Ada.Strings.Fixed, Ada.Strings.Bounded, and Ada.Strings.Unbounded in C.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: