Hacker News new | past | comments | ask | show | jobs | submit login

This is also why C++ (and hopefully other languages by now) has the short string optimization. On a 64-bit machine, a pointer to a heap-allocated buffer is 8 bytes long. A stupidly large number of the strings used by actual programs are <= 7 bytes long (it's actually more like 20 in C++, because strings also include a length and capacity). Think about it: that's virtually every file extension, path separator, and field delimiter, and a good number of field names, labels, first & last names, usernames, identifiers, URL path components, parameter names, etc. If you can fit the whole string into an immediate value, then you can avoid the heap allocation and deallocation entirely, you can avoid chasing pointers and incurring a cache miss, and you can very significantly reduce the total amount of memory used.

This is also a good reason why, if you're designing a standard library, you really want to have length-prefixed strings instead of null-terminated ones. If you know the length of the strings you're dealing with, you can swap out the algorithm you use, such that you might use brute force for a very small needle, word comparisons if it's exactly 4 bytes long, SSE instructions if it's a word-length multiple, or Boyer-Moore for very long strings.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: