More

regecks · 2025-08-27T07:31:33 1756279893

I’d be interested to see a benchmark of actually selecting rows by manually specifying the partition vs not.

This benchmark seems to be pure computation of the hash value, which I don’t think is helpful to test the hypothesis. A lot can happen at actual query time that this benchmark does not account for.

shayonj · 2025-08-27T14:23:46 1756304626

That's fair. I had an EXPLAIN ANALYZE which shows the time it takes for planning and execution. And sending the query directly to the table outperforms that.

More to come soon.

regecks · 2025-07-02T19:47:51 1751485671

We’re looking for a distributed Go cache.

We don’t want to round trip to a network endpoint in the ideal path, but we run multiple instances of our monolith and we want a shared cache tier for efficiency.

Any architecture/library recommendations?

maypok86 · 2025-07-02T20:44:48 1751489088

To be honest, I'm not sure I can recommend anything specific here.

1. How much data do you have and how many entries? If you have lots of data with very small records, you might need an off-heap based cache solution. The only ready-made implementation I know is Olric [1].

2. If you can use an on-heap cache, you might want to look at groupcache [2]. It's not "blazingly-fast", but it's battle-tested. Potential drawbacks include LRU eviction and lack of generics (meaning extra GC pressure from using `interface{}` for keys/values). It's also barely maintained, though you can find active forks on GitHub.

3. You could implement your own solution, though I doubt you'd want to go that route. Architecturally, segcache [3] looks interesting.

[1]: https://github.com/olric-data/olric

[2]: https://github.com/golang/groupcache

[3]: https://www.usenix.org/conference/nsdi21/presentation/yang-j...

derekperkins · 2025-07-10T13:43:38 1752155018

Olric is awesome, we've been using it for 2 years in prod. No complaints.

dpifke · 2025-07-03T18:35:16 1751567716

Otter can be used as the backing store with groupcache-go, which is a fork of the original groupcache: https://github.com/groupcache/groupcache-go#pluggable-intern...

awenix · 2025-07-02T20:34:24 1751488464

groupcache[https://github.com/golang/groupcache] has been around for some time now.

HALtheWise · 2025-07-03T17:46:15 1751564775

The original groupcache is basically unmaintained, but there's at least two forks that have carried on active development and support additional nice features (like eviction), and should probably be preferred for most projects.

https://github.com/groupcache/groupcache-go

pstuart · 2025-07-02T23:58:04 1751500684

It's very limited in scope, but if it solves your needs it would be the way to go.

mrweasel · 2025-07-03T09:12:44 1751533964

I'm insanely fascinated by Groupcache. It's such a cool idea.

sally_glance · 2025-07-03T00:52:37 1751503957

Hm, without more details on the use case and assuming no "round trip to a network" means everything is running on a single host I see a couple of options:

1) Shared memory - use a cache/key-value lib which allows you to swap the backend to some shmem implementation

2) File-system based - managing concurrent writes is the challenge here, maybe best to use something battle tested (sqlite was mentioned in a sibling)

3) Local sockets - not strictly "no network", but at least no inter-node communication. Start valkey/redis and talk to it via local socket?

Would be interested in the actual use case though, if the monolith is written in anything even slightly modern the language/runtime should give you primitives to parallelize over cores without worrying about something like this at all... And when it comes to horizontal scaling with multiple nodes there is no avoiding networking anyway.

nchmy · 2025-07-02T22:29:54 1751495394

perhaps a NATS server colocated on each monolith server (or even embedded in your app, if it is written in golang, meaning that all communication is in-process) and use NATS KV?

Or if you just want it all to be in-memory, perhaps use some other non-distributed caching library and do the replication via NATS? Im sure there's lots of gotchas with something like that, but Marmot is an example of doing SQLite replication via NATS Jetstream

edit: actually, you can set jetstream/kv to be in-memory rather than file persistence. So, it could do the job of olric or rolling your own distributed kv via nats. https://docs.nats.io/nats-concepts/jetstream/streams#storage...

stackskipton · 2025-07-02T21:55:26 1751493326

Since you mention no network endpoint, I assume it's on a single server. If so, have you considered SQLite? Assuming your cache is not massive, the file is likely to end up in Filesystem cache so most of reads will come from memory and writes on modern SSD will be fine as well.

It's easy to understand system with well battle tested library and getting rid of cache is easy, delete the file.

EDIT: I will say for most use cases, the database cache is probably plenty. Don't add power until you really need it.

mbreese · 2025-07-03T02:08:45 1751508525

Could you add a bit more to the “distributed cache” concept without a “network endpoint”? Would this mean running multiple processes of the same binary with a shared memory cache on a single system?

If so, that’s not how I’d normally think of a distributed cache. When I think of a distributed cache, I’m thinking of multiple instances, likely (but not necessarily) running on multiple nodes. So, I’m having a bit of a disconnect…

paulddraper · 2025-07-02T21:53:48 1751493228

LRU in memory backed by shared Elasticache.

remram · 2025-07-03T00:02:01 1751500921

It can't be shared without networking so I am not sure what you mean. Are you sure you need it to be shared?

regecks · 2025-03-14T02:13:16 1741918396

I haven’t benchmarked this, but I have recently benchmarked Spark Streaming vs self-rolled Go vs Bento vs RisingWave (which is also in Rust) and RW matched/exceeded self-rolled, and absolutely demolished Bento and Spark. Not even in the same ballpark.

Highly recommend checking RisingWave out if you have real time streaming transformation use cases. It’s open source too.

The benchmark was some high throughput low latency JSON transformations.

chenquan · 2025-03-14T02:44:16 1741920256

Thanks for your recommendation.

regecks · on Sept 8, 2023

It works but it shows phone numbers rather than contact names and you can’t assign a name to a number without giving access to your entire contacts … it ticks me off.

pshirshov · on Sept 8, 2023

Contact scoping on grapheneos solves that.

throwaway290 · on Sept 8, 2023

You see userpics though. Works for me...

emsixteen · on Sept 8, 2023

Ah right, fair enough.

regecks · on Aug 13, 2023

What’s this “neural information retrieval system” thing about?

I’m just hacking away and presenting the LLM with some JSON data from our metrics database and making it answer user questions as a completion.

Is this embedding thing relevant for what I’m doing? Where should I start reading?

regecks · on Aug 1, 2023

I always set `RestartSec`.

vidarh · on Aug 1, 2023

That, plus restricting the number of restarts within an interval, is good.

You can then also set "OnFailure" to trigger another unit if the failure state is reached, e.g. to trigger a notification.

E.g.:

    [Unit]
    ...
    OnFailure=notify-failure@%n.service

    [Service]
    Type=simple
    Restart=on-failure
    RestartSec=5
    ..
    StartLimitBurst=5
    StartLimitIntervalSec=300

regecks · on July 4, 2023

No, it isn’t. Browser root programs have certificate transparency (SCT embedding) requirements which would immediately reveal to the world if an ISP started to use a trusted root to MITM its users.

regecks · on June 9, 2023

I think the title buries the most horrifying part of this. The HiCA certificate authority is relying on an RCE to do an end-run around the semantics of the ACME HTTP-01 validation method.

Fucked up and they should be booted from every root program for this.

agwa · on June 9, 2023

They aren't in any root programs. They're just taking certificate requests and relaying them to real CAs, which is why they need to exploit an RCE in the ACME client, since the ACME client wouldn't otherwise be able to complete the validations required by the actual CA.

chunk_waffle · on June 9, 2023

When confronted they just flat out shut down the service. They also donated $1000 to the project, and they've redirected requests to their payment site to the US White House's website, and they're from China.

They were also suggesting that user's ran the utility as root...

All really shady...

stavros · on June 9, 2023

Wow, that's... bold.

regecks · on June 3, 2023

There is years and years and years of amazing content and community on Reddit. Rallying people around a replacement looks challenging. I am hopeful that it happens.

regecks · on March 9, 2023

ELI5 OpenXLA vs TensorRT? Are they solving the same problem, just that the former is not married to NVIDIA devices?

tzhenghao · on March 9, 2023

They're solving the same "high level problem", but with very different approaches.

TensorRT is proprietary to Nvidia and Nvidia hardware. You'd take a {PyTorch, Tensorflow, <insert some other ML framework>} model and "export / convert" it into essentially a binary. Assuming all goes well (and in practice rarely does, at least on first try - more on this later), you now automatically leverage other Nvidia card features such as Tensor cores and can serve a model that runs significantly faster.

The problem is TensorRT being exclusive to Nvidia. The APIs for doing more advanced ML techniques like deep learning optimization requires significant lock-in to their APIs, if they are even available in the first place. And all these assuming they work as documented.

OpenXLA (and other players in the ecosystem like TVM) aim to "democratize" this so there are more support both upstream (# of supported ML frameworks) and downstream (# of hardware accelerators other than Nvidia). It's yet another layer or two that ML compiler engineers will need to stitch together, but once implemented, they in theory can do a lot of optimization techniques largely independent of the hardware targets underneath.

Note that further down in the article they mention other compiler frameworks like MLIR. You can then hypothetically lower (compiler terminology) it to a TensorRT MLIR dialect that then in turn runs on the Nvidia GPU.

lairv · on March 9, 2023

I still don't fully grasp what XLA is, where does XLA sits against CUDA, ROCm, OpenVino ? Against ONNX/ONNX-Runtime ? Against OpenAI Triton ?

mathisfun123 · on March 9, 2023

basically all correct but

>You can then hypothetically lower (compiler terminology) it to a TensorRT MLIR dialect that then in turn runs on the Nvidia GPU.

there's no tensorrt dialect (there are nvgpu and nvvm dialects) nor would there be as tensorrt is primarily a runtime (although arguably dialects like omp and spirv basically model runtime calls).

tzhenghao · on March 9, 2023

Good catch and good point. What I was thinking was NVVM dialect. You're right on TensorRT being mostly a runtime.

Joky · on March 10, 2023

TensorFlow is also a runtime, yet we model its dataflow graph (the input to the runtime) as a dialect, same for ONNX. TensorRT isn't that different actually.

BlamSthzusa · on March 9, 2023

OpenXLA is an open-source library for accelerating linear algebra computations on a variety of hardware platforms, while TensorRT is a proprietary library from NVIDIA that's specifically designed for optimizing neural network inference performance on NVIDIA GPUs.

mathisfun123 · on March 9, 2023

openxla is a ML-ish compiler ecosystem built primarily around mlir that can target (through nvptx backend in llvm) and run on nvidia devices (on iree). tensorrt is a runtime for cuda programs. certainly they have features in common as a reflection of their common goals ("fast nn program training/inference") but the scope of tensorrt is much narrower.