Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Streamdal – an open-source tail -f for your data (github.com/streamdal)
148 points by dsies on Oct 31, 2023 | hide | past | favorite | 37 comments
Hey there! This is Dan and Ustin (@uzarubin), and we want to share something cool we've been working on for the past year - an open-source `tail -f` for your data, with a UI. We call it "Streamdal" which is a word salad for streaming systems (because we love them) and DAL or data access layer (because we’re nerds).

Here's the repo: https://github.com/streamdal/streamdal

Here's the site: https://streamdal.com

And here's a live demo: https://demo.streamdal.com (github repo has an explanation of the demo)

— — —

THE PROBLEM

We built this because the current observability tooling is not able to provide real-time insight into the actual data that your software is reading or writing. Meaning that it takes longer to identify issues and longer to resolve them. That’s time, money, and customer satisfaction at stake.

Want to build something in-house? Prepare to deploy a team, spend months of development time, and tons of money bringing it to production. Then be ready to have engineers around to babysit your new monitoring tool instead of working on your product.

— — —

THE BASIC FLOW

So, wtf is a “tail -f for your data”. What we mean is this:

1. We give you an SDK for your language, a server, and a UI.

2. You instrument your code with `StreamdalSDK.Process(yourData)` anytime you read or write data in your app.

3. You deploy your app/service.

4. Go to the provided UI (or run the CLI app) and be able to peek into what your app is reading or writing, like with `tail -f`.

And that's basically it. There's a bunch more functionality in the project but we find this to be the most immediately useful part. Every developer we've shown this to has said "I wish I had this at my gig at $company" - and we feel exactly the same. We are devs and this is what we’ve always wanted, hundreds of times - a way to just quickly look at the data our software is producing in real-time, without having to jump through any hoops.

If you want to learn more about the "why" and the origin of this project - you can read about it here: https://streamdal.com/manifesto

— — —

HOW DOES IT WORK?

The SDK establishes a long-running session with the server (using gRPC) and "listens" for commands that are forwarded to it all the way from the UI -> server -> SDK.

The commands are things like: "show me the data that you are currently consuming", "apply these rules to all data that you produce", "inspect the schema for all data", and so on.

The SDK interprets the command and either executes Wasm-based rules against the data it's processing or if it's a `tail` request - it'll send the data to the server, which will forward it to the UI for display.

The SDK IS part of the critical path but it does not have a dependency on the server. If the server is gone, you won't be able to use the UI or send commands to the SDKs, but that's about it - the SDKs will continue to work and attempt to reconnect to the server behind the scenes.

— — —

TECHNICAL BITS

The project consists of a lot of "buzzwordy" tech: we use gRPC, grpc-Web, protobuf, redis, Wasm, Deno, ReactFlow, and probably a few other things.

The server is written in Go, all of the Wasm is Rust and the UI is Typescript. There are SDKs for Go, Python, and Node. We chose these languages for the SDKs because we've been working in them daily for the past 10+ years.

The reasons for the tech choices are explained in detail here: https://docs.streamdal.com/en/resources-support/open-source/

— — —

LAST PART

OK, that's it. What do you think? Is it useful? Can we answer anything?

- If you like what you're seeing, give our repo a star: https://github.com/streamdal/streamdal

- And If you really like what you're seeing, come talk to us on our discord: https://discord.gg/streamdal

Talk soon!

- Daniel & Ustin




This is awesome, the UI looks beautiful.

I've noticed you've provided Go, Python, and Node SDKs. What's the general tech stack for these? I assume your usage of Protobufs is for a consistent schemas between languages?

I ask because I'm curious as to how much work it is to define new SDKs for other languages, as I'd love a Java implementation - Ideally the SDK should be a pretty thin wrapper, simply calling the gRPC service with some minimal error handling, is this the case?


> This is awesome, the UI looks beautiful.

Thank you! I wrote the UI! It's a pretty tricky UI stack as we update everything to the browser realtime in protobuf over grpc streaming (using grpc-web and protobuf-ts). There is a lot mapping we have to do to shape the data properly for React Flow so we do that server side in Deno before passing it along to the browser. We still have some optimization to do to keep the live tail view zippy, but it's a pretty solid foundation.


Hey there - we have documented the tech stack here: https://docs.streamdal.com/en/resources-support/open-source/

Tldr: go, grpc, Protobuf, wasm, deno, reactflow, ts

And yep, you’re right - we are using protobuf to have a common schema between all SDKs, the server and UI.

Re: sdk implementation - it’s basically implementing grpc methods, knowing how to exec wasm and doing a couple of extra things at instantiation. In real terms - it took us about a week to implement the python SDK - that’s with learning how to do wasm, Protobuf and grpc in python + 1 week afterwards to iron out edge cases.

Re: Java - that was going to be the next sdk we do but we have no idea if it needs to be a specific Java version? Should we target lowest possible Java version? We need to have a solid wasm runtime support - so maybe that limits us to newer versions of Java. Is that a problem?

I did Java a looong time ago - so need some outside input at this point haha


Thanks for the info, sounds like you have a pretty solid tech stack :)

Re Java - If you're looking to maximise compatability, then yeah you should aim to target an older JDK. Virtually all Java projects use at least JDK 8 so that can be a baseline, however many enterprise projects would use closer to JDK 18 at a guess (Google's internally aiming to migrate to 21 in 2024). Generally if there are libraries or features from newer JDKs that you do want to use, I'd say just go for it, since JDK 11, the releases have been yearly (there was a three year gap between JDK 8 and 9) and more incremental.

What I would recommend is using Kotlin rather than Java, Kotlin's completely interoperable with Java, but provides a much nicer development experience. That way Kotlin clients get niceties such as named parameters [1] (which with data classes [2] can pretty well replicate StreamdalClient) [1] and Protobuf DSLs [3] and Java clients still get a first class, completely interoperable API.

No idea what WASM support is like for Java, I suspect it's lagging behind other implementations, however the most popular framework is Teavm.

1: https://kotlinlang.org/docs/functions.html#named-arguments 2: https://kotlinlang.org/docs/data-classes.html 3: https://developers.googleblog.com/2021/11/announcing-kotlin-...


> Re Java - If you're looking to maximise compatability, then yeah you should aim to target an older JDK. Virtually all Java projects use at least JDK 8 so that can be a baseline,

Oracle says that JDK 11 is on “Extended Support” which comes after “Premier Support”.[1] Why not just support JDK 17 and higher?

[1]: https://endoflife.date/oracle-jdk


This is solid - thank you very much. We will do some more research but basically sounds like - go as low as possible, as long as the underlying libs support it.

And re: kotlin - I last worked/played with it in 2016 and recall that it was MUCH nicer to work in compared to Java.

I just did a quick cursory look and it seems like Kotlin only has slightly slower builds compared to Java and rest of perf is basically the same due to generating similar bytecode. Neat!


This is fabulous work! Really excited for this release! Can't wait to get into all the details and see how I can use it in my organization.


Hey peeps - Dan here - ready to answer any questions you've got!


Congratulations on the launch!

I have been a long time customer of the enterprise version of Streamdal, and I can confidently say Daniel and Ustin are absolutely KILLER engineers. Any time we spoke I always was impressed by their super deep experience and understanding of modern challenges!

So good to see you getting some love on HN. Excited to implement this in some personal projects as well!


Thanks Ivan - really appreciate the kind words and support!


The repository's README.md links to <https://docs.streamdal.com/sdks>, which is not there.

Those SDKs are simply libraries though, aren't they? "SDK" often stands for more than that (e.g., development tools, bits of code not properly packaged), and may be appalling if you don't want to wrap a project around such an SDK, as opposed to merely incorporating a library.

But then I wonder why it has to be a library at all, limited to just 3 languages: why not to implement a more unixy interface, perhaps with named pipes? And/or a library with C API (so that it can be called from any common language), providing file descriptors to write into. With the former approach, basic named pipes or files and actual tail -f can be used, too.


Re: docs - oops. We were frantically putting stuff together and linking before the docs were in that location - the link is supposed to be https://docs.streamdal.com/en/core-components/sdk/ . Fixed it in the readme.

re: libs vs sdk - we named it that in anticipation of exactly having to do some funky stuff. As it stands, we are already doing grpc, Protobuf, wasm and having it all interop across all languages is not easy - so having to introduce some sort of a “helper” binding/lib is not at all unlikely.

Besides that, the “tail” part is really a small part of the functionality - the overall idea is that the sdk/lib has access to most/all I/O of the app and is able to interact with the payload that the caller provides before it is sent on its way.

Traditional pipes aren’t really in the equation.

We went with calling it “tail” because it’s easier to explain instead of “it’s a lib that an app owner can wrap their i/o calls with to enable calling dynamic wasm”… and that’s still not the whole thing haha

Last, here’s a diagram depicting the flow: https://docs.streamdal.com/_astro/StreamdalSdk.cd7c8d45.png


Node support - sweet!! Is there any way to interact with the data as it comes in real-time?


Hey there! I wrote the node sdk for this, you can use the node-sdk to to execute wasm-based rules on your data and interact with the data in real time. With the wasm rules you can do things like detect and mask PII, etc. The node-sdk is here: https://github.com/streamdal/node-sdk. There are some minimal examples of these pipelines in the readme and examples/sandbox directories.


I'd like to use it but I have strict requirements for data not leaving our network


Good news then :) Everything stays on your network. Actually, in most situations - everything stays completely client-side. Because the rules that the client executes are Wasm modules, all data inspections and transformations occur in the client itself.

There is a server component (that you host) - but it is only used for pushing rules/Wasm down to the SDKs and for facilitating tail - that's it.


Obligatory: We already have a open-source tail -f; it's called `tail -f`. (Kind of /s, but much like the infamous dropbox comment[0], it sucks that there exists a problem for this to solve in the first place.)

0: https://news.ycombinator.com/item?id=9224


I wonder if we screwed up by calling out “tail” - it is so much more then that - it executes wasm rules on the client that are pushed to it by the server AND because we have access to the data - we can expose a UI to see it flowing… like a “tail -f” - but that doesn’t quite flow off the tongue :)

I’d urge you to check out the live demo and “tail” an app at runtime - it might be able to explain what we we are doing better than I can.


I don't actually have a problem for this to solve - like the rsync guy, I know how to use actual `tail -f` (and eg `perl -E` et al) - I'm just self-aware enough to realise that most people either aren't up to the task of hacking together whatever functionality they'd actually end up using out of generic utilities at all, or at least wouldn't consider that so easy as to be the path of least resistance to get something done.


But can `tail -f` handle data?


I know you kid - but the _data_ in this context is the data that the app is processing at runtime. Ie. If the app is reading from a DB - that’s what we are tailing.

And if that’s already clear - sorry :)


It wasn’t, thanks. :)


Congrats on the lunch Dan and Ustin!


It was tasty : )


Maybe I need to read the docs in more detail, but a question that comes to mind is what is the impact on latency and reliability? Can StreamdalSDK.Process be made asynchronous so that any delay in processing the data, or unavailability of the server, has minimal impact on the flow of the application? That is, to make the observing as passive as possible?


In the node-sdk, process pipeline is async , see: https://github.com/streamdal/node-sdk/blob/main/src/streamda... (I'm the author of that). I believe this is also the case for the python and go sdks as well. So you can call it asynchronously for passive observability.

However, we implemented the pipeline rules in wasm with the goal of keeping the overhead as minimal as possible. So you could also use it as more of a data security or governance tool and invoke the pipelines synchronously and mask or prevent sensitive data before passing it along.


Thanks for sharing. Looks simple and easy to use.

Could you use this for batch data jobs as well? I would imagine having integration with batch job frameworks like Spark would make this more valuable from an organisation perspective.

Also, small note on the website, as an example from the "What is streamdal" page, I feel like I'm bombarded with emojis, bold and italic text, links, etc.


Hmm good idea about the spark integration - integration is possible with basically anything that you’ve got code-level access to. I don’t know about messaging though - runtime data transformations for spark? I guess data folks would have no problem with that hmmm.

And re: emojis - we’ll tone it down - we were all working hard on docs late into the night and may have gotten a little wild with emojis haha :)


Been using Streamdal for a little while now to catch data inconsistencies introduced by human input on our data pipelines, and it's been a huge improvement and game-changer for keeping things running smoothly. Thanks team!


That console UI looks beautiful!


Thanks Tim! We're working toward having visually appealing UI, especially for open source!


Is there not a literal "tail -f" type client to send data from log files?


We live more on the preventive side where we tap into the real-time data flow before it reaches logs or data stores. Therefore, enabling you to be preventive vs reactive.


Awesome work! Love this


Love the implementation here, much needed monitoring approach on data pipelines


Congrats on the launch! Been waiting for this.


Grats on the launch!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: