Hacker News new | past | comments | ask | show | jobs | submit login

The "stochastic probability stuff" falls apart once ISPs discover they can profitably influence the consensus outcome by messing with packet arrival times between nodes.



Yeah that's an interesting angle. I believe the problem in the first place is forcing the network to come to global consensus. In my project the idea is to form "lazy consensus" (i.e. just enough consensus for everyone to have a consistent outlook on the world).

Datalisp is a data interchange format rather than a cryptocurrency though.. so it behaves a bit different.


"Just enough consensus" in BFT settings is necessarily 2/3 vote from all votes on agreement. That's a tall order.


Why is that necessary?

Edit: To be a bit less standoffish and make my position clear.

You only need to form a consensus with the part of the network you are aware of. Datalisp leans into the properties of propagator networks (fully CP systems) to avoid needing to synchronize everyone eagerly.


Try googling "Byzantine Fault Tolerance" and read the original papers by Leslie Lamport. 2/3 is required to ensure liveness.


Yeah, so if you look at some commentary around that paper there is a variant of the problem with "approximate agreement" that can solve for example the 3 person variant. This fits with the theory (really just a hypothesis at this point but actively worked on) that I have w.r.t. datalisp.

Edit: see for example slide 30 of this slideshow: https://www.cs.cmu.edu/~15712/lectures/15-bft.pdf

I'm not (yet) very good at probability theory so it's a bit slow going filling in the model I have in mind (that is; deciding the message format) but propnets are the correct framework to fit it into.


If you read the Avalanche paper, you'll see early on that all of its claims to being BFT stem from an assumption about the distribution of message arrival rates in the system. That is a very generous assumption that can easily be wrong. Also the original BFT literature makes no assumptions about the arrival times or even reliability of the network -- BFT constraints apply even if the network is 100% reliable. It only gets worse from there; 2/3 honest votes is the best you can do under ideal circumstances.


I am in no way trying to defend avalanche. I have been speaking from the standpoint of datalisp.

Again, the Byzantine generals problem of the paper is not an accurate description of what is happening in the datalisp network and does not apply except locally (in a 'neighbourhood' of the dispute) so it's not 2/3 of the whole network but rather 2/3 of the echo chamber that needs to agree...

Anyway you seem to be eager to dismiss off-hand so I am losing interest in debating with you as-is.


So is global agreement on state a criterion for datalisp's correct operation, or not?

If not, then why compare it to Avalanche at all?


Could you please read our exchange from your first comment and tell me how you parse it?

From what I can tell; I agree with your dismissal of avalanche and offer an intuition for what may be the problem by contrasting it to my work-in-progress idea, then we have a series of back-and-forths where I consistently mention datalisp but never once talk about avalanche..

Regarding global agreement of state, yes it is not a criterion for datalisp. Datalisp approaches the network like a sheaf (locally consistent but globally not necessarily so) and uses the properties of propagator networks (which are fully CP, therefore they converge on a fixpoint) to reason about convergent consensus that executes lazily.

The casual dismissal of datalisp (as it stands) is that the message format for communicating your beliefs w.r.t. probability of signal-to-noise is not fully worked out yet and all we have is a series of heuristical arguments.

However, datalisp is a data-intechange format first and foremost and optimized for building authenticated datastructures. It is not specialized for cryptocurrencies per-se (since the focus is on second order byzantine fault tolerance).


This would be going a lot more smoothly if you could articulate which distributed systems properties, exactly, you intend datalisp to provide. So far, you're contradicted yourself at least once here:

> I believe the problem in the first place is forcing the network to come to global consensus. In my project the idea is to form "lazy consensus" (i.e. just enough consensus for everyone to have a consistent outlook on the world).

So, which is it? Not coming to global consensus and "just enough consensus for everyone to have a consistent outlook on the world" seem mutually exclusive.

Don't be afraid to lay down a bold system description, and don't hold back on the specifics. Not trying to brag, but I have a PhD in distributed computing, I work on a blockchain at my day-job (https://github.com/blockstack/stacks-blockchain), and I chair its governance process. I'm pretty sure I'll be able to understand a rigorous system design. But, I don't have very much patience for weasel-wording, pussyfooting, or hand-waving.


I had already looked at your profile so I am aware of your credentials.

In the original comment I mention my project and point to resources for learning more. I don't know why you think I am "weasel-wording, pussyfooting or hand-waving" when what is in fact happening is that we are having this discussion in a place that is not exactly optimal for writing pages and pages of explanations but rather for high-level discussion (which is necessarily brief on details but may link to resources).

I will be the first to admit that what I am doing is not fully worked out yet. Then again you are familiar with research work so you know things take a while to become polished enough to stand up to the "surface level scrutiny" even if the ideas may have merit from the beginning. The fact that I am working in the open and engaging people like you in discussion should not be considered "weasel-wording" or whatever slur you come up with.

You claim I am contradicting myself yet you do not elaborate on how so. I have already given you the argument for why it is not a contradiction (global consensus happens eventually due to convergence and "lazy" in computer science is another way to say "demand driven" - i.e. we converge as much as we need to in each "neighbourhood" of the network, the sheaf reference should further enhance your mental image of what is going on).

If you want to go into details with me and discuss things more thoroughly then I suggest you join the datalisp channel on telegram. I would be happy to hear your criticism and try to understand better in which ways I may be lacking. After all, what we all want is better systems of governance and if you are intrigued by this approach (or rather; high level description of a possible approach) then from my point of view our interests are aligned.


> But, I don't have very much patience for weasel-wording, pussyfooting, or hand-waving.

This was meant in a general sense; it wasn't an accusation. I apologize if it was interpreted otherwise.

I was hoping to see more about what distributed systems properties datalisp will be providing? It's fine if you're not ready, but if so, then maybe you could speak more to the specific problems you're trying to solve?


Sure, datalisp is really a very simple idea.

First we take canonical S-expressions (made by Ron Rivest for cryptographic signing) as the syntax of the "language" (again it's primarily a data-interchange format meant to be used to build distributed systems).

We enforce the restriction that the csexps must have versioned operators, that is (2:op1:0...) would be a the operation `op` and version `0`, the version is a natural number.

That's it for the data interchange format, now for the semantics of the language.

As I have been saying the primary idea is to use propagator networks (propnets). The name of datalisp comes from datafun which is an extension of datalog. Datalog is something you are probably familiar with, it's not turing complete and is a constructive logic programming language that uses "relations" and postulated "facts" to infer new "facts" - i.e. it builds out a transitive closure of sorts.

Datafun takes this and tries to extend it as much as possible without becoming turing complete, so rather than being limited to the boolean lattice of normal logic it uses any lattice and monotone functions, e.g. a propnet. However it tries to do type inference and be a "programming language".

One nice property of datalog is that the evaluation semantics coincide with the shortest vector path algorithm used to do routing on the internet, except datalog is declerative rather than imperative so it's an interesting way to do routing. The idea of datalisp is to make the "XPath of canonical S-expressions" be a datafun-ish language that can then also do the routing.

Now the reason this has anything to do with cryptocurrencies is based on the claim that the internet is a propnet where each place is a "max-signal lattice". The idea being that each user tries to put the computer into a state that most aligns with their intent (`max` over a total order - in this case ordered by "signal" - is a lattice), therefore they will not choose to persist messages that are spam i.e. messaging in the network is monotone (at the limit you have security and only choose to replace information if it rewards you with higher signal).

The goal of the datalisp network is to make this intuition concrete. There is a lot of details but this is the general gist.

So the anti-sybil mechanism in the network is called "proof of trust" we make participants in the network sign their messages and include a prediction of the signal-vs-noise of their message. Peers in the network will have a prior how much to trust the self-evaluation of this key that they then use to order the message.

Second order byzantine fault tolerance is a class of problems and the representative problem we try to solve (i.e. a "second order decentralization complete problem") is deciding whether to merge a patch and rebuild your system in a byzantine setting. If you can do that then you can of course also do a ledger that simulates some formal language. This is a fancy way to say we want a way to do updates in a distributed system with no single source of truth.

The consensus mechanism is based in lattice theory, we have "join consensus" which is a race of sorts, that is you can always merge a patch ahead of the network if you like (stay on the "bleeding edge" so to speak) and then there is a "meet consensus" which is based on the avalanche algorithm and is used to move up the bottom and collapse all patches (or more generally; metadata) into its state.

So the semantics of datalisp end up being something similar to "probabilistic datafun" if you are familiar with semi-naive datalog evaluation semantics then it's a bit like that except we also have a cutoff w.r.t. how much confidence we are willing to tolerate to use a fact / inference. Instead of programming datalisp in a text editor it is more like a graphql of sorts, it /is/ the internet, or rather it reifies the propnet datastructure. The reason we care so much about canonicality and consensus is so that we can take advantage of content addressing.

I have some ideas about building on top of this framework mixnets and simulations of different political systems, then the idea is to do composition via constant function market makers since they are convex and therefore solve the tragedy of the commons (local optimization is global optimization). Since the network is decentralized you can be byzantine, that is you can make your own identifiers in datalisp (each operator has an infinite namespace due to versioning, we use the probabilistic logic programming and metadata gossiping in the network to find out what schema the data is supposed to fit, if there is a namespace clash we form a local consensus to re-version one of the clashing names to the first unused version).

The properties of lattices allow you to "zoom out" of such echo chamber bickering.. this is how anti-sybil works...

There's a lot of ideas basically, I hope you can see the general picture of what I am planning.


> However it tries to do type inference and be a "programming language".

"tries to do"? It either does or it doesn't. If you're not sure which, then it doesn't. Let's not use weasel words and misrepresent what the system actually does.

> One nice property of datalog is that the evaluation semantics coincide with the shortest vector path algorithm used to do routing on the internet,

Routing on the Internet is famously not shortest-path, so I don't know how to interpret the rest of this.

> The idea being that each user tries to put the computer into a state that most aligns with their intent (`max` over a total order - in this case ordered by "signal" - is a lattice), therefore they will not choose to persist messages that are spam i.e. messaging in the network is monotone (at the limit you have security and only choose to replace information if it rewards you with higher signal).

You seem to be describing a system where anyone can submit as many messages as they want, but it's up to each node to decide which ones are worthy of consideration ("not-spam") and which ones are not ("spam"). Identifying messages as spam is a notoriously hard problem. It's not even truly solved with email, for example; Cory Doctorow's humorous checklist applies here: https://craphound.com/spamsolutions.txt

Which brings me to:

> So the anti-sybil mechanism in the network is called "proof of trust" we make participants in the network sign their messages and include a prediction of the signal-vs-noise of their message. Peers in the network will have a prior how much to trust the self-evaluation of this key that they then use to order the message.

This is not an anti-sybil mechanism in any sense of the word. If anyone can send arbitrary amounts of messages (of which an arbitrary amount can be spam), and run arbitrarily many nodes, then any attempt to predict the ratio of not-spam to spam messages is going to fail. Someone who wanted to break this system can choose how many nodes to run and how many spam/not-spam messages to send in order to trick other nodes into mis-classifying messages 100% of the time.

If you want a better understanding of sybils and how to deal with them (specifically, what not to do), you should read the original paper: https://www.microsoft.com/en-us/research/wp-content/uploads/...

But, know that sybil-resistance is not the same as BFT.

> Second order byzantine fault tolerance is a class of problems and the representative problem we try to solve (i.e. a "second order decentralization complete problem") is deciding whether to merge a patch and rebuild your system in a byzantine setting.

Making up new words is a great way to confuse and annoy your readers. If you want nodes to decide whether to make forward-progress when other nodes can fail arbitrarily, you need Byzantine fault tolerance. There's a plethora of papers and real-world implementations of high-performance BFT agreement protocols. The language and data format has absolutely nothing to do with the agreement problem.

> The consensus mechanism is based in lattice theory, we have "join consensus" which is a race of sorts, that is you can always merge a patch ahead of the network if you like (stay on the "bleeding edge" so to speak) and then there is a "meet consensus" which is based on the avalanche algorithm and is used to move up the bottom and collapse all patches (or more generally; metadata) into its state.

It sounds like your nodes execute some kind of speculative execution for state-transitions that they somehow believe will be accepted by the network, even though they have no confirmation that they have or will. This is all well and good -- speculative execution of a happy path is a well-known design tactic. But that's not the hard part of distributed system design; the hard part is how to handle the case where the happy path is unavailable. What's the unhappy path look like?

> So the semantics of datalisp end up being something similar to "probabilistic datafun" if you are familiar with semi-naive datalog evaluation semantics then it's a bit like that except we also have a cutoff w.r.t. how much confidence we are willing to tolerate to use a fact / inference. Instead of programming datalisp in a text editor it is more like a graphql of sorts, it /is/ the internet, or rather it reifies the propnet datastructure. The reason we care so much about canonicality and consensus is so that we can take advantage of content addressing.

It sounds like your nodes have no way to actually agree with one another with any certainty, so I take this with a huge bucket of salt.


First, thanks for taking the time.

Alright, I realize that what I wrote out on the spot was quite hand-wavy. The devil is in the details, also I honestly am a bit perplexed why you are so insistent on not looking at the (terrible) paper but willing to invest in the discussion at all.

For example the spam comments; in the paper I mention a construction from a project called Vac that deals with this. Bandwidth can be throttled per node per epoch by leaking secret shares of a key that put up collateral.

Regarding shortest path; interesting, I thought it was. Guess I was wrong. Still it is a valid way to do routing and is how I'd intended to do it in the prototype network. The big idea of versioning the operators is to be able to converge on the desired semantics.

The reason I focus on the proof of trust part is because that is how a bunch of decisions are made (such as amount of bandwidth to allocate). You still need a bunch of other constructions to use this data.

The anti-sybil is not just anti-spam it is also about finding correlated intent and packaging that whole echo chamber up as an individual so that you can rate limit / adjust visibility. Major news from an uninteresting group may be more interesting than minor news from an interesting group.

Regarding "making up words" - there are precise definitions that I didn't reproduce because I figured you would want to hear a different narration than the one presented in the paper / the channel.

The way nodes participate in the network is by pinning content addresses and gossiping metadata about pins. At any given time the "commons" is made explicit because it is pinned by a bunch of nodes. The nodes build metadata on top of the bottom (commons), when a consensus is reached, i.e. there is some metadata that everyone has join-ed into their state, then we have a meet consensus and the pin can be moved.

You are still thinking in terms of first order BFT which is why your criticism is a bit.. off base. The network is not trying to start off automatic but rather manual and then discovered patterns will be automated once proven correct.

If you read (and understand) how the amoveo oracles work and look into the properties of prediction markets then you can see that this is actually a social network built on a data-interchange format that is optimized for building authenticated datastructures (i.e. amenable to automation of any first order problems that can be solved with MPC).

Edit: W.r.t. Datafun "trying to" do type inference. Datafun is a research project, he has found most of the type inference rules but some are still missing. The language is not really usable as-is but it "tries to" be, carrying around two typing contexts (one for lattice types and another for algebraic types) and having modal subtyping and such things is complicated and hard to do well. Datalisp is less interested in such things and is closer to being a probabilistic datalog. Again, the versioning of operators is so that we can converge on desired semantics; the mvp is going to be using bayesian factors for inferences and odds of signal:noise for priors attached to facts, this is just for pragmatic reasons (easy to code / reason about) it is probably insufficient for the economic model and composition of prediction markets.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: