Hacker News new | past | comments | ask | show | jobs | submit login
Deepstream 5.0: Resurrected using MIT license (deepstream.io)
108 points by yasserf on Oct 27, 2019 | hide | past | favorite | 37 comments



The biggest pain point I've experienced with these pubsub services is how to negotiate client connections so that all the subscribers to a topic are directly connected to the same machine. In other words, clients' messages do not have to ever be relayed at the networking level. This means providing some reconnecting or connection-info data for the client, and it's why I think most of these frameworks fail to improve on home cooked solutions: you have to touch the message to help squeeze out efficiency.

A non-IP based routing mechanism requires some kind of header/parsing/deserialization of the message being passed. In application level code this is quite slow, you have to query some kind of map synchronized about cluster nodes. This is especially shitty if message addresses are ephemeral, especially per RPC call (which is the default behaviour in most cluster-based RPC frameworks), because then your near cache of the cluster map is never populated. Alternatively you can just broadcast to everyone (skip the cluster map) but then why cluster?

It would be great if we just had IPv6 adoption though. Then you could just uh, route. The DNS based routing in stuff like Kubernetes is close but not for external Internet facing clients. The whole point is to avoid a bastion host.


I totally agree. Having a single point of entry gives us some small benefits like a single connection/simplified deployments but it isn't efficient to have a provider connected on one node in a cluster and have to forward data to subscribers on another just because of how they were distributed by a load-balancer. Theres actually an action in the handshake protocol which allows a node in a cluster to redirect a connection to a different/more optimal node. It was used for multi-tenancy clusters previously. In theory you can run multiple isolated clusters (or individual nodes) and have multiple connections from the client, routed based on which subscription lives where. It's a pattern I have been talking about with a couple of users but it hasn't been required by anyone yet. It isn't as efficient as doing it on a network layer but it at least reduced intercluster traffic dramatically.


I used to be really concerned about this, but one day I sat down and did the math and realized that the worst case scenario is that I'd have no more than two times as many messages, which means this isn't a scalability issue: if a topic has N subscribers, at bare minimum you are going to have N+1 messages to send a given event (one to receive it into the cluster, and N to dispatch it to the subscribers); if everyone is connected to some random machine, then you need to take the incoming event and send it to the right internal machine that is storing that topic (which is one extra message) and then that machine will need to indirectly send the message to the connecting computers of the subscribers (which is one extra message per subscriber), so you end up with 2N+2 messages. The alternative, where users cluster themselves based on topic with their entry connections, is that these individual end users have to maintain a ton of separate connections for inbound messages for each of the topics they are interested in, which is often going to work out to be much worse for everyone involved.


Wouldn't the ingress help here with sticky sessions e.g. cookies or a header.

I am assuming the RPC is happening over WebSockets though.


What is deepstream? The website doesn't really say what it is for.


https://deepstream.io/tutorials/concepts/what-is-deepstream/

Generally deepstream is an alternative to using firebase, socket.io, featherJS and meteor. However rather than putting the logic within the server you instead run them as microservices and allow deepstream to handle load-balancing for you.

The two main things deepstream offers are:

- Events

You can publish and subscribe to events by simple doing:

deepstream.event.emit('event-name', data)

deepstream.event.subscribe('event-name', data => {})

Which is the functionality you expect to see from most pubsub mechanisms

- Records

Records is events with persistence. They are more heavy weight objects that persist their content in a cache + db and across all connected devices without the user/developer having to actually do anything. The also support offline support if you want to be able to get/set data while offline.

// get a record

const record = client.record.getRecord('unique-data-name')

// wait for record to be loaded from server

await record.whenReady()

// subscribe to any changes that happen in the future

record.subscribe(data => console.log('someone updated this object!', data)

// discard when finished

record.discard()

You can also set data directly without having to subscribe using

client.record.setData('some-record', newData)

The are multiple other aspects deepstream provides like merge-conflicts, permissions, authentication and some useful patterns, but the above code is about 90% of what you would expect a user to use in deepstream


It's a real-time pub/sub messaging system (competitive to NATS), with built in messaging patterns like request/response.

Also supports persistence of data records (JSON docs) and syncing changes to those docs (competitive to Firebase, etc), backed by Redis.

Works with different protocols (websocket, MQTT, HTTP) and has client libraries for popular languages.


Yeah odd. The github link though has ...

> deepstream is an open source server inspired by concepts behind financial trading technology. It allows clients and backend services to sync data, send messages and make rpcs at very high speed and scale.


I added to the CNCF Cloud Native Interactive Landscape:

https://landscape.cncf.io/category=streaming-messaging&forma...

It's the the 24th streaming & messaging project or product we're tracking.


Please add some text at the top explaining what it is.



thanks for the feedback! Do you mean in the actual Hero or to elaborate more within the 'what it is' section on the home page?


I spent 5 minutes trying to figure out what Deepstream, because I was intrigued. I was unable to figure it out based on the front page, documentation link and the GitHub repo.


What sort of documentation would make most sense? We went through iterations of example animated apps, code samples and text but can't seem to explain it clearly yet =( Any input would be much appreciated!


I think Facebook does a fantastic job of demonstrating the value of their tooling that they open source.

Buck: https://buck.build/

Presto: http://prestodb.github.io/

The front page of these contain the project name, one-line elevator pitch, long form description, and an automatically playing demo of the product.


This is a great comment. I've come across deepstream over the years. I've always come away thinking that I could make something similar that would serve my needs without adding a dependency into my project. You need to demonstrate some compelling feature.

Eg. sync

Naive copy sync: 53ms

DeepStream sync: 6ms


I would say the compelling feature is it's opensource 'serverless' data-sync with permissions, clustering, monitoring and auth built in. Serverless here meaning there is no server code required, just run the server. The easiest way to market it would just be the OS competitor to firebase.

The issue with that however is taking that approach means we lose the NodeJS ecosystem support which is massive (and required to add custom plugins/maintain the server). We tried it before and it was terrible in terms of metrics and participation.

Thank you for the feedback though! I will definitely be looking at redesigning the home page and will use your feedback when doing so!


> The issue with that however is taking that approach means we lose the NodeJS ecosystem support which is massive (and required to add custom plugins/maintain the server).

Could you explain what you mean here in greater detail? I'm only talking about changing your marketing copy. I don't understand why you lose the NodeJS ecosystem.

By the way I'd also remove most of the content from the front page. It's too busy. Make every word count.


Yeah when we made deepstream sound more like a standalone deployment, similar to nginx or rethinkdb (meaning you can get an executable/install via package managers and so forth) and downplayed the NodeJS part of it we ended up having alot less people contribute to the project or even using it as they wouldn't be aware that it could be installed via NPM. I guess my point is that the status quo of node server dependencies seem to be run as part of a bigger project (featherjs, meteor, socket.io, sockeetcluster) which means you npm install and configure it via javascript. The last couple versions have been me trying to navigate the landscape so I can provide a totally standalone/configuration based server that can also be extended by end users (hence all the typescript interfaces and support).

Basically I'm not certain how to pitch this exactly. I feel a bit of condolence from the fact that when it was a startup with over ten employees we had the same issue (specially since I was the tech guy)

Anyways, more than happy to hear any suggestions! I been involved with this project for over 3 years so I definitely have a biased view on trying to see the project for the first time.


I'd focus on sync as your headline feature. It's technologically complex to do well and it provides real value for mobile apps that want to do a great offline experience. Our phones are offline way more than most of us realize and we don't notice it because Google / Apple have did such a great job with sync. But many 3rd party apps are not great at that and consequently have crappier user experiences due to connectivity problems.


That's really not what serverless means (especially when right after you say just run the server). Id recommend leaving that out. Less buzzwords are better.


> What sort of documentation would make most sense?

A single sentence in the landing page with a clear description of what it is and what it does would suffice.


Just describe what deep stream is in one sentence on your landing page. Right at the top. "Deepstream is a pubsub solution that ...".


It isn't just a pubsub solution though

To compare to other frameworks out there:

featherjs: A framework for real-time applications and REST APIs

meteor: THE FASTEST WAY TO BUILD JAVASCRIPT APPS

deepstream: a fast and secure data-sync realtime server for mobile, web & iot

The problem is that once you provide more functionality then pub/sub you end up in this vague toolkit statement land.

I would assume that the issue is more around the fact that the term data synchronization is still not used as often and so doesn't give the same familiar feeling people get when looking at other pub/sub frameworks. Deepstream supports HTTP/MQTT/Binary and JSON websocket protocols for rpcs, records, events and presence. Maybe that should be the sentence!

A secure scalable realtime server that supports HTTP, MQTT and WebSockets for rpcs, data-sync, pubsub and presence functionality


Just my opinion, but the two most useful pieces of information that helped me understand what this does and visualize where it fits were

1) https://deepstream.io/tutorials/concepts/what-is-deepstream/ > "What is it for" section

2) and your comment that it's an alternative to firebase, socket.io, meteor.

Perhaps you can think of using these two pieces of information as the front-page blurb rather than make it abstract.


How well does Deepstream work with offline periods, e.g. stuff being created/modified on a disconnected mobile device that later connects and needs to sync?

Is it possible to make it work in a way that the server only sees encrypted data? I can see it working in a clunky kind of way if each client had two versions of each document, a clear one that is operated on, and one encrypted (possibly a record at a time or something) that is synced.


Looks promising! Does deepstream work well as a standalone server or is it best used in combination with express.js if I am building a real-time web app?


Deepstream works best as a standalone server!

Ideally you would just configure it using a config file for your permissions (https://deepstream.io/tutorials/core/permission/valve-introd...) and use http auth to authenticate your users (https://deepstream.io/tutorials/core/auth/http-webhook/).

You can see this sort of configuration here https://deepstream.io/guides/live-progress-bar/

If you want you could add a custom plugin that would take the HTTP server within deepstream and enhance it using express for all your non-deepstream routes, if you raise an issue for it I think it's something that can be done in the near future! But generally wouldn't recommend for non-pet projects since they serve two different purposes.

Edit: Also thank you!


Here is the project description because it took way too much navigation to find:

> deepstream is an open source server inspired by concepts behind financial trading technology. It allows clients and backend services to sync data, send messages and make rpcs at very high speed and scale.


phoneix channels does the same thing and can fallback to long polling while relying on the erlang vm to scale out to many more machines.


Why MIT? I recall it was AGPL before.


Yup! We changed it to MIT as AGPL actually limited a few people from using it in their companies (company wide license regulations) and I would rather promote adoption under a more permissive license.


Do you have .NET Core driver/client? or at least gRPC endpoint?


I guess it's time to build my own cryptocoin exchange.


We actually had a couple of usecases that used deepstream for that sort of thing.

https://vimeo.com/143728632 was what led to deepstream being initially written a few years back


The UI looks incredible - was it closed source, and did it die with the company?


Thanks! We had an awesome UX designer as a co-founder. It was closed source yup, also based on a pretty old stack of knockout and CSS so it isn't very salvageable




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: