Hacker Newsnew | past | comments | ask | show | jobs | submit | zbentley's commentslogin

One possible drawback of this kind of system is performance (or broker CPU) getting dragged down by crazy/bad filtering queries.

Normally, those issues are solved the usual way (monitor, identify, fix). It’s rarer to see systems that proactively detect/reject costly arbitrary queries when they’re issued, though.

Proactively detecting potentially bad SQL queries in RDBMSes relies on table statistics (can’t be known for streams) or query text/plan analysis heuristics (hairy, subjective/error prone).

But it just occurred to me: could RabbitMQ’s choice of Erlang enable the easy rejection of query plans above a certain cost?

Could the BEAM be easily made to reject a query plan (assuming the plan—or a worst-case version of it at least—can be compiled into a loopless/unrolled chunk of BEAM bytecode ahead of time) with a reduction count more than a user specified threshold?

That might be interesting, if possible. Most runtimes don’t have user-surfaced equivalents of reduction counts, so there might be some mechanical sympathy in RabbitMQ’s case.


What an incredibly useful feature. Besides the obvious developer experience benefits, it’s huge for network-bound use cases: really heavily optimized uses of RabbitMQ (or less-optimized uses with really big message payloads) end up bottlenecked or paying lots of money for broker network capacity, since a message’s bytes must cross the wire 2 or more times (publish, consume, maybe replication) for it to be processed. Moving filtering logic to the consumer side helps a lot with that—but workloads should still use separate queues/topics/streams instead whenever they can, of course (I’m sure there will be some one-topic-for-everything abuses enabled by the combination of poor architectural foresight + SQL filtering, but such is life).

I am confused, though: why does the bloom filter … er, filter still need to be manually specified by the consumer (filterValues in the example Java)?

As far as the broker filtering query evaluation logic is concerned, bloom-filter enabled fields are just indexes; why can’t the SQL-filter query planner automatically make use of them?

I’m probably missing something, but it seems like a very light query plan optimization pass would not be hard to implement here; there’s only one kind of index, and it can only be used with equality comparisons, so it doesn’t seem like the the implementation complexity would be too bad versus needing a fully general SQL optimizing plannner.


You miss GP’s point. They’re not assuming good faith, they’re pointing out that the government already knows identity credentials and can, encrypted or not, quite easily correlate digital activity with those credentials.

The question isn’t whether the government can/will identify and track you. They do, in good faith or bad. This is unfortunate and attempts to allow them to decrypt or acquire additional data about citizens’ activities (like chat control) should be opposed, but identity/activity tracking is omnipresent and irreversible.

The question is whether identity credentials should be available which reduce the risk of additonal credential theft or bad-faith action (e.g. by other entities stealing non-secure-for-digital-use credentials like passports).


Sure, but that is as vacuously true as saying “router keeps getting hacked? Just unplug it from the internet.”

Huge numbers of businesses want to use AI in the “hey, watch my inbox and send bills to all the vendors who email me” or “get a count of all the work tickets closed across the company in the last hour and add that to a spreadsheet in sharepoint” variety of automation tasks.

Whether those are good ideas or appropriate use-cases for AI is a separate question.


Would LLMs help with that? Seems like they could be phished as well.

Also, there’s a difference between “know how to be secure” and “actually practice what is known”. You’re right that non-AI security often fails at the latter, but the industry has a pretty good grasp on how to secure computer systems.

AI systems do not have a practical answer to “how to be secure” yet.


> The fact that there's no foolproof way to distinguish instruction tokens from data tokens is not careless

Repeat that over to yourself again, slowly.

> it's a fundamental epistemological constraint that human communication suffers from as well

Which is why reliability and security in many areas increased when those areas used computers to automate previously-human processes. The benefit of computer automation isn’t just in speed: the fact that computer behavior can easily be made deterministically repeatable and predictable is huge as well. AI fundamentally does not have that property.

Sure, cosmic rays and network errors can compromise non-AI computer determinism. But if you think that means AI and non-AI systems are qualitatively the same, I have a bridge to sell you.

> Saying that "software engineers figured out these things decades ago" is deep hubris

They did, though. We know how to both increase the likelihood of secure outcomes (best practices and such), and also how to guarantee a secure behavior. For example: using a SQL driver to distinguish between instruction and data tokens is, indeed, a foolproof process (not talking about injection in query creation here, but how queries are sent with data/binds).

People don’t always do security well, yes, but they don’t always put out their campfires either. That doesn’t mean that we are not very sure that putting out a campfire is guaranteed to prevent that fire burning the forest down. We know how to prevent this stuff, fully, in most non-AI computation.


>> The fact that there's no foolproof way to distinguish instruction tokens from data tokens is not careless

> Repeat that over to yourself again, slowly.

Try using less snark.

And if you have a fundamental breakthrough in AI that gets around this, and demonstrates how "careless" AI researchers have been in overlooking it, then please share.


The best docs are the ones you can trust are accurate. The second best docs are ones that you can programmatically validate. The worst docs are the ones that can’t be validated without lots of specialized effort.

Python’s type hints are in the second category.


I’d almost switch the order here! In a world with agentic coding agents that can constantly check for type errors from the language server powering the errors/warnings in your IDE, and reconcile them against prose in docstrings… types you can programmatically validate are incredibly valuable.

Do you have an example of the first?

Languages with strong static type systems

Is there a mainstream language where you can’t arbitrarily cast a variable to any other type?

> whether that would still be true had we approached public transit and health care subsidies the same way European countries did?

Why wouldn’t it? I’ve heard many different explanations for the US’s wealth, but never that it’s wealthy because it saves on expenditures. There is also a solid case to be made that healthcare specifically would, if socialized, drive up productivity, earning power, and reduce fiscal risk (and risk aversion) for many demographics, all of which are good for GDP and other measures of a country’s wealth.

As for mass transit? It has costs and benefits too, but they’re a drop in the bucket compared to healthcare costs.


Slower? In top speed maybe, but not in time-to-destination (or, given congested streets, average speed).

Trains “require” you to make a transfer? Depends on your city, I guess; many train systems are hub-and-spoke-like enough (and dense enough) that common commutes don’t require any transfers. Also, I’m curious whether bus-centric mass transit requires more or fewer transfers than train-centric or hybrid.


> Slower? In top speed maybe, but not in time-to-destination (or, given congested streets, average speed).

Yep. Transit is ALWAYS slower on average compared to cars. It is faster only in a very narrow set of circumstances.

Try an experiment: drop 10 random points inside a city, and plot routes between them for cars and transit (you can use Google Maps API). Transit will be on average 2-3 times slower, even in the rush hour.


Yes, and?

The outcome of that approach is that an important service has uniform low costs to direct consumers, many of whom rely on the service for their quality of life, and many of whom would be unable to afford the service if its costs were passed along to them instead of subsidized via government debt and taxes.

In other words, a public service. That’s a good thing.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: