I was recently annoyed to find postgres indexes don't support skipping [1] you a...

RedShift1 · 2024-12-07T12:50:37 1733575837

What is a reasonable use for a null character in a string? My first instinct is that strings with nulls in them should absolutely be rejected.

larsnystrom · 2024-12-07T13:06:09 1733576769

There are two kinds of programmers: Those who think of strings as text, and those who think of strings as a sequence of bytes. The second group doesn’t care about the special case where a byte is all zeroes.

mattashii · 2024-12-07T13:21:03 1733577663

In that second case the string is better represented as "bytea", which has most (but not all) of the features of the "text" type.

pdimitar · 2024-12-07T13:44:06 1733579046

I agree with your take, it's just that many programmers want to easily jump from "byte array" to "string in XYZ encoding". I personally prefer byte arrays for unsafe data and to do deserialization in application code.

rlupi · 2024-12-07T15:46:11 1733586371

In other words, considering we are talking about string and unicode...

There are two types of programmers, those that are wrong and those that are very wrong

pdimitar · 2024-12-07T15:49:44 1733586584

lol. :)

Funny but not entirely true. I had cases when we had to urgently store a firehose of data and figure out the right string encoding later. Just dumping the strings with uncertain encoding in `bytea` columns helped us there.

Plus for some fields it helps with auditability f.ex. when you get raw binary-encoded telemetry from devices in the field, you should store their raw payloads _and_ the parsed data structures that you got from them. Being this paranoid has saved my neck a few times.

The secret is to accept you are not without fault and take measures to be able to correct yourself in the future.

emmelaich · 2024-12-08T03:38:58 1733629138

Indeed, one system I dealt with used char instead of blob. The text as stored was riddled with U+FFFE (unicode unknown character).

lelanthran · 2024-12-07T13:52:16 1733579536

Yup. It's a huge red flag when a datatype intended to be used for representation of written human language is abused to store something that has no glyph recognisable in any human language.

There's a lot to complain about with nul-terminated strings, but not being able to store arbitrary bytes ain't one of them.

jagged-chisel · 2024-12-07T13:02:08 1733576528

Not everything needs to be a C-string (null-terminated array/sequence of characters.) We are advanced enough with our understanding of Things that we can include metadata along with a chunk of bytes to indicate “this is a ‘string’ and it’s q bytes long and can have any value you want in there.”

That said, I’m with you. And if someone wants nulls inside their “strings” then they probably want blobs.

tlarkworthy · 2024-12-08T00:00:41 1733616041

That your JSON deserializer accepted them.

jagged-chisel · 2024-12-07T12:57:57 1733576277

> you also can't have the nul character in a string …

Let me introduce you to blob…

hamilyon2 · 2024-12-07T12:10:50 1733573450

Yes, skip-index scans require custom sql now.

I am also a bit annoyed by cache-like uses not being first-class. Unlogged tables get you far, temporary tables are nice, but still all this feels like a hurdle, awkward and not what you actually need.

drtgh · 2024-12-07T16:25:32 1733588732

> I am also a bit annoyed by cache-like uses not being first-class.

Since what happened recently with Redis[1] the first thing I thought about was Postgre, but the performance[2] difference is too noticeable, so one have to look for other alternatives, and not very confident due thinking such alternatives may follow the same "Redi's attitude" ( ValKey, DragonflyDB, KeyDB, Kvrocks, MinIO, RabbitMQ, etc etc^2 ).

It would be nice if these cache-like uses within Postgre had a tinny push.

[1] https://news.ycombinator.com/item?id=42239607

[2] https://medium.com/redis-with-raphael-de-lio/can-postgres-re...

    XXXXX achieves a latency of 0.095 ms, which is approximately 85% faster than the 0.679 ms latency observed for Postgres’ unlogged table.
    
    It also handles a much higher request rate, with 892.857,12 requests per second compared to Postgres’ 15.946,02 transactions per second.