Hacker News new | past | comments | ask | show | jobs | submit | bgentry's comments login

Developer of River here ( https://riverqueue.com ). I'm curious if you ran into actual performance limitations based on specific testing and use cases, or if it's more of a hypothetical concern. Modern Postgres running on modern hardware and with well-written software can handle many thousands or tens of thousands of jobs per second (even without partitioning), albeit that depends on your workload, your tuning / autovacuum settings, and your job retention time.

Perceived only at this stage, though the kind of volume we’re looking at is 10s to 100s of millions of jobs per day. https://github.com/riverqueue/river/issues/746 talks about some of the same things you mention.

To be clear, I really like the model of riverqueue and will keep going at a leisurely pace since this is a personal time interest at the moment. I’m sick of celery and believe a service is a better model for background tasks than a language-specific tool.

If you guys were to build http ingestion and http targets I’d try and deploy it right away.


Ah, so that issue is specifically related to a statistics/count query used by the UI and not by River itself. I think it's something we'll build a more efficient solution for in the future because counting large quantities of records in Postgres tends to be slow no matter what, but hopefully it won't get in the way of regular usage.

> Perceived only at this stage, though the kind of volume we’re looking at is 10s to 100s of millions of jobs per day.

Yeah that's a little over 100 jobs/sec sustained :) Shouldn't be much of an issue on appropriate hardware and with a little tuning, in particular to keep your jobs table from growing to more than a few million rows and to vacuum frequently. Definitely hit us up if you try it and start having any trouble!


You could make that argument for lots of services that have external side effects, but that’s about what happens after the service has been asked to do a thing (to send an email in this case).

However just because an action may be duplicated after the provider has been asked to do a thing, it does not eliminate the value of the provider being able to deduplicate that incoming request and avoiding multiple identical tasks on their end. Without API level idempotency, a single email on the client’s end could turn into many redundant emails at the service provider’s side, each of which could then be subject to those same subsequent duplications at the SMTP layer. And even then, providers can use the Message-Id header to provide idempotency in delivery as many do.

This is an unavoidable consequence of distributed systems where the client may not know if the server ever received or processed the request, and it may also occur due to client-side bugs or retries within their own software.

In other words, API level idempotency can help eliminate all duplication prior to the API; depending on the service, the provider may also be able to eliminate duplication afterward as well. So it’s strictly better than not having it, really not that difficult to implement, and makes it easier for integrators to build a robust integration with you.


> makes it easier for integrators to build a robust integration with you

No, don't say 'easier'. It makes it possible to build a robust integration. We need to stop with this notion that omiting idempotency from an API just makes things "more difficult" to develop. Without idempotency, you garuantee that the resulting system is "difficult" to use and full of nasty issues that are waiting for the right conditions to collapse the entire house of cards you've built.

So many SaaS providers have never even heard of idempotency, let alone design it into their APIs. Many people believe you can just sprinkle it on as a library without having to think about it.

All APIs with multiple distributed servers must support idempotency. Refuse to do business with any organisations who do not design this into their APIs!


Hah, I agree, "easier" is too soft :)


> Without API level idempotency, a single email on the client’s end could turn into many redundant emails at the service provider’s side, each of which could then be subject to those same subsequent duplications at the SMTP layer.

Ok so now there’s a 1/100 million chance that the client gets 3 duplicate emails.

I’m not arguing that idempotency is never important. The most popular blog post I’ve ever written is about the 2 generals problem and how idempotency can help.

I’m arguing in this specific instance it doesn’t matter.

As far as I was aware duplicate message-id headers aren’t deduped by every client, but if they are being used for deduplication just expose that in your api and let the caller set it.


> This is what we get allowing mega-corporations control our media.

What mega corporation is the Washington Post part of?


Jeff Bezos himself could be considered a mega-corporation.

We're at the point where the personal wealth of oligarchs such as him has begun eclipsing the wealth of all but the very largest corporations. His own personal wealth would rank him somewhere around 60th on the Fortune 500.


It puts him in the top half of countries, just below Croatia, by national net wealth.


Amazon, yes?


Apparently there are a lot of people confused about this, but no, Amazon does not have ownership over the Washington Post. Jeff Bezos bought it with his own personal funds using an LLC: https://en.wikipedia.org/wiki/Jeff_Bezos#The_Washington_Post


And surely there is no relation between Bezos and Amazon so the part about "This is what we get allowing mega-corporations control our media" is obviously false.


It is literally false in this case, yes. The corporation has no control over WaPo, even though the same individual has some control over both.


Some control?


This is a practically useless distinction.


But a pedantically, technically accurate one, just on brand for this site!


The distinction is not wholly useless, since it provides a frame whereby we can interpret this as evidence that the corporate veil is fictive, and that the "evil corporation" problem is actually caused by evil individuals.

My instinct is that Evil Man theory is as simplistic (and wrong) as Great Man theory: there's probably some better explanation that this also provides evidence for, that I'm missing.


> The corporation has no control over WaPo, even though the same individual has some control over both.

And I'm sure Jeff Bezos is so pure he'd never use his power over WaPo to help Amazon!/s


I think people struggle with the fact that it's not a legally so. However, per the article, Bezos is exerting power over WP because Amazon lost contracts to Microsoft (due to WP being critical of Trump). Their balance sheets may not be aligned, but their interests are.


And if Bezos toes the line and does things that Trump likes, then Amazon may benefit.

Trump sees "friends" and "enemies" [0], and doesn't care about actual ownership of shares. "Bezos is a friend and sorted out that WaPo mess, so I'll cut him some slack and kill off that Amazon anti-trust thing" is something we can all picture Trump saying.

[0] Obviously I have no idea what he actually sees or thinks, but this picture seems to match his public pronouncements.


Bezos fully owns WaPo personally. It is not under Amazon


The backlight bleed of LCDs would look pretty awful at night. Also I’m not sure how others feel about it but to me a DIY LED matrix is way cooler than dropping in a prebuilt screen with HDMI input.


The TV show Silicon Valley lampooned all these ideas a decade ago. The weird thing I’ve noticed is that when Bay Area tech people watch that show, they don’t seem to understand that they’re being made fun of. They think they’re being celebrated. That’s how thick the bubble is.

I've never met a person who didn't understand that this show is satire. Every single tech person I've talked to about Silicon Valley thinks it's funny because of how plausible and yet ridiculous it all is, and because of all the totally accurate details scattered throughout—from golden handcuffs / resting & vesting, down to minor things like which drinks were stocked in the show's office fridges. And I lived in the Bay Area during its entire run, so most of my network is current/former Bay Area tech people.


Everyone I knew who watched the old Office Space movie thought it was funny; I never liked it because it was too much like real life (at the time). Same with the TV show The Office.


Big Bang Theory was all the rage when I was working at CERN. I know exactly what you mean.


I heard one word of that show through my roommate's door and knew it was satire


Oh c'mon, I gotta ask - what word? :)


NipAlert, if you count that as a word, haha


Gotta admit that makes total sense.


Ben Thompson has been covering Intel’s precarious position for over a decade (well before the market finally realized it) and the latest update is not looking good:

Intel’s is technically on pace to achieve the five nodes in four years Gelsinger promised (in truth two of those nodes were iterations), but they haven’t truly scaled any of them; the first attempt to do so, with Intel 3, destroyed their margins. This isn’t a surprise: the reason why it is hard to skip steps is not just because technology advances, but because you have to actually learn on the line how to implement new technology at scale, with sustainable yield. Go back to Intel’s 10nm failure: the company could technically make a 10nm chip, they just couldn’t do so economically; there are now open questions about Intel 3, much less next year’s promised 18A.

https://stratechery.com/2024/intel-honesty/


I know that post, but the problem is he is just extrapolating from history. Not a bad thing in absence of real information, but... Well, let's hope he's wrong. :-)


I actually just re-read that whole thread earlier this evening for unrelated reasons: https://news.ycombinator.com/item?id=31508000

In short, the Go module proxy causes an excessive traffic volume on git VCS sources with frequent clones of unchanged repos. Regardless of whether or not the developer is/was always reasonable in how he discussed this, he was absolutely right about this being a hostile behavior from the official Go proxy that is the result of bad/insufficient engineering. The team's suggestions to simply stop refreshing his one domain were also not sufficient given that the problem clearly impacts all Go module VCS hosts.

The developer also appeared to be banned in a way that violated the Go CoC's own provisions around fair notice and a proper hearing, which is super disappointing to see.


Oh man, was Drew banned from all Go spaces, or just from the issue tracker as he mentioned? He seems to draw ire, although whenever I actually read what he writes, he usually makes a lot of sense. I imagine there are examples of him being abrasive, but it usually seems like he values being thoughtful and kind.

I was actually thinking of someone else: https://news.ycombinator.com/item?id=34311643


River ( https://riverqueue.com ) is a Postgres background job engine written in Go, which also has insert only clients in other languages. Currently we have these for Ruby and Python:

https://github.com/riverqueue/riverqueue-ruby

https://github.com/riverqueue/riverqueue-python


Having recently adopted Resend and skimmed a bunch of different email APIs, I'm still waiting to find a single provider whose API supports Stripe-style idempotency [1] so that I can guarantee I don't send the same email through their API multiple times. I'd like to confidently avoid accidentally spamming a user if i.e. a background job retries multiple times due to an unrelated error, or merely from failing to receive the API response that an email was sent/created successfully.

Plunk's API does not appear to offer any such feature: https://docs.useplunk.com/api-reference/transactional/send

Unfortunately neither does Resend, Sendgrid, Postmark, etc.

[1]: https://docs.stripe.com/api/idempotent_requests


Yep I was shocked that SendGrid doesnt do this

I don't even expect them to keep a list of IDs forever, just some best effort like "we don't send anything with the same id twice in a one-hour window"

Gmail's API did support this I believe, and then ofc my org transitioned to Microsoft so my little homemade email service quit working


Probably too late, but we support this over at mailpace.com

https://docs.mailpace.com/guide/idempotency


Hi. I recently took over engineering at Postmark. Noted! Thanks for the feedback.


While you are here, I have been asking for years to have better access control on API keys so they can only use assigned servers. So my staging cant send prod emails...


API tokens are server specific aren't they?


They are!


I understand this change might be a large undertaking to implement across the entire API, and although it would be good to do everywhere, it’s primarily just the send email API that needs it.


I like the idea. Staff engineer and I discussed a bit. Added something to our backlog so we don't lose track of it.


Be sure to check out Waypoint (usewaypoint.com). It's an email API with a tightly integrated template builder and has idempotent requests (https://www.usewaypoint.com/docs/idempotent-requests).


Thank you! I'm not sure how this didn't come up in what I felt was a pretty comprehensive round of Google searching, but it looks like a fantastic option. If only they had free allotment or something <$20/mo for a small number of emails :) But otherwise this looks a the solution I was looking for.


This is exactly what I've been wrestling with recently.. it's so disappointingly absent in all the offerings out there.

The solution I ended up with was to build my own pseudo-idempotency around Postmark. It helps that Postmark at least has proper persistence so you can query their API to check if you've already sent a certain email. I had to move away from Mailchimp because, if an email gets queued for some reason, it isn't reflected as sent in the API so there's no way of knowing whether an email is hiding in "queued limbo". Postmark doesn't have this problem.

The only caveat is that there is a delay between sending an email and it being reflected in the API (seems pretty standard across the different services). This means I had to implement a simple database backed lock to make sure I never try to send the same email more than once in quick succession.

It's kind of ridiculous, but.. I couldn't find any alternative.

Edit:

Wonder if it would be worth wrapping something like this up into some kind of a self-hostable proxy service. You'd provide the idempotency key in a header and it would do the rest


Hi. Thanks for the feedback. Recently took over Postmark's engineering and interested in seeing where we can do better as we keep building.


Resend is also built on top of SES, and yet it's an extra layer of hosted services on top of it. So this is actually a self-hosted alternative to i.e. the functionality that Resend offers, even though there are still other underlying dependencies. Seems like a reasonable way to describe it IMO.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: