It was a shame that most people, seemingly the Eve devs as well, seemed to focus on making Eve a language for easy programming for the masses. Thats a huge wall of a problem with a limited funding slope.
I think if there had been more upfront work and focus on the distributed programming/database possibilities, that Eve would have founded an actual community to push it along.
Eve as a language for doing business logic against say Kubernetes? As a language to wire up ETL processes?
I think it could have found a niche that would have made it grow in power before trying to take on the mass market.
I assume Lineage Driven Fault Injection stuff [1] has some overlap with Eve's ability to tell you "Why is this blank?": The datalog model allows you to find the logical dependencies of results.
Some other bloom related links:
- Anna KVS[2] showed up recently on hacker news[3] and morning paper[4]
- Lasp lang is in the same space[5], Christopher Meiklejohn has a comparison with bloom[6]
But it requires "Ruby (MRI) 1.8.7 and 1.9. Bud also has experimental support for Ruby 2." which means it's on life support. Fair for a research project started when 1.8.7 was current but not much for doing any real work today.
This is a fascinating language with some great ideas. I tried years ago to create my own implementation in Python but while I found the paper straight-forward the implementation in Bud, to me at the time, was quite dense.
It'd be interesting to revisit it again with some of the advances we've made in deep specifications and dependent type theory.
The short summary is that Bloom's position is that as programmers we have been trained -- to our detriment -- to think too much in terms of order. Thinking of algorithms being a sequence of operations on data is unnecessarily constraining and non-parallelizable, particularly in reactive systems.
Elixir's semantics are functional. Just as with any conventional language, it is your job to write code to track changes to some data structure and explicitly propagate those changes to other parts. For example, read from a socket into a buffer, parse and transform it, then write a formatted HTML output to the output. You are explicitly sequencing the order of computation.
Bloom's semantics are based on logic, so it may seem unfamiliar to the average programmer. It is an enhancement of datalog (itself a subset of Prolog) Its rules are declarative, just as in a spreadsheet. For example, say you enter a formula in Excel for cell A1, "=B1 * 100 + C1". This is declarative. You merely declare what needs to happen, not how the internal data structures are traversed and in which order. It is "understood" (that's the semantics part) that when B1 or C1 change, A1 will be automatically changed. Now imagine B1 has a formula that depends on C1. Now when C1 changes, there are possibly two ways of calculating the value of A1 (i) old value of B1 + new C1 or (ii) new value of B1 + new C1. Spreadsheets resolve this problem by doing a topological sort of cell dependencies, and updating cells in that order. That is to say, the topological sort is part of the spreadsheet semantics. In the example above, the second option will be chosen in all spreadsheets. The limitation of course is that you cannot have B1 refer to C1 and vice-versa. Cycles are not permitted.
Bloom (and Datalog) can have mutually dependent rules, so that if one datum changes, everything else dependent on it also changes, which results in that datum changing yet again, since it is dependent on others. Bloom has so called "fixpoint semantics", which means that the infrastructure keeps evaluating these rules (read formulas) in a loop until there is no more change (reached a "fixpoint") to any datum. As it turns out, the semantics are such that these loops are always guaranteed to terminate. Bloom has only one data structure, a table, whose values are primitives or lattices (the stuff that CRDTs are based on). Bloom's operators allow you to write rules that react to changes to any table and automatically update dependent tables. No explicit sequencing of code is needed; the language has a semantics of evaluation.
Bloom adds two features (to Datalog) that make it suitable for reactive computation. First is time, which is not clock time, but refers to a round of computation; in each round, all pending input events are processed until there are no more changes. There are some operators that permit you to defer changes to the next round. Bloom couples this with the notion of transient tables whose value is only valid for a single round. All I/O is based on special transient tables called channels; these accumulate events that have to be processed (input or transmitted) in a given round.
I'm inexperienced with this kind of system, so my views might be quite naïve... But my first concern with a language like his would be that it's hard to reason about memory usage and performance. Ultimately your code needs to run in a real system with limited resources. What have your experiences been in that regards? Have you found it's a worthwhile tradeoff?
As a point of comparison, when I learned Haskell I really enjoyed it and found it made expressing certain algorithms much simpler, but I basically kept all resource constraints out of mind.
1. bloom and its descendents/companions are research projects first, so far, in the future, somebody probably sits down and creates a super-optimized reactive database based on this research in Rust or something
2. our trade has been moving from managing constraints on performance and memory to trying to make more correct programs as easy as possible, resources be dammed. This is just the next step, along-side GC-managed memory, or haskell-like lazy-evaluation
That's right, Not in a spreadsheet. Excel will reject it. It has to be a directed acyclic graph.
But in Bloom, the only data structure you have available is a lattice, whose change operators have special properties. Consider a prominent example of lattices, namely, sets
The only way to add to a set is to union in a new set (of changes). So following the spirit of your example, suppose there are ΔC items to be added to C. This results in some rules getting triggered that generate ΔB changes to B, which in turn produces (possibly) fresh ΔC changes to C and so on in a loop.
You will find that given the operators, soon there will come a time when the deltas add nothing new to either table; the computation stops at that point. B and C have mutually coevolved to a new state.
Lattices are basically data structures (like sets) fitted with a partial order, and a merge operator (like set union) that has these three simple properties:
idempotence: The same stuff added to the set does not change the set
commutative: Two sets can be merged either way (A union B == B union A)
associative: Order of merges don't matter.
A union (B union C) == (A union B) union C.
As an aside, these are the building blocks of CRDTs, which is why you get eventual consistency (all replicas eventually converge to the same result)
Edit: The merge operators are order-independent. The evaluation semantics don't care about the order in which rules are written down in the program. The Bloom docs speak of "disorder" which is a bit misleading, albeit catchy.
I think I don’t understand something important from your explanation, since I still can’t see how gp comment’s loop is possible in Bloom. Not even “possible”, but how it is implemented to be predictable and ever halting.
More realistic example could be an invoice row with a tax or discount, where (discount / (price + discount)) != discount_percent, due to .00 rounding and ieee754 problems in general. How do such systems solve these disrepancies? Is stop point arbitrary or controllable?
edit: i.e. 3% of 11.99 = 0.3597, which is 0.0003 off from 0.36
Recall that in Bloom, dataflow operations are on tables (B1 and C1). It makes no sense to add 1 to a table.
But you can add another set. Say the rules are
B = C + D
C = B + E
where each of these are tables, and + represents union. If you change D (add 1 row), then B will get that row. In turn C will get that row (second rule). Now the computation stops because C has nothing to give B that it doesn't already have.
In effect, B and C will always have all of D and E.
A fix point is when things have stopped changing. When it comes to floating point (fixed points vs fix points!), one can have an epsilon beyond which all changes are deemed negligible. If that epsilon has more precision than you need, you have a deterministic answer. Bloom does not implement that out of the box, but you can implement your own lattice types to fit into the system.
That mostly makes sense so I apologize if this is pedantic, but:
Is it not possible for a set to have logically infinite sets in Bloom? For instance, "the set of all integers" ? If so, how does it determine the limits of set sizes - is it connected to bit counts? Eg, "The set of all 8bit integers" ?
You can guarantee termination by limiting bit precision, but it is more interesting to limit the set of operators your program uses instead.
For instance, you might have an operator that translates from hostname to IPv4 address.
That function can return billions of different numbers, but if your program’s input only has 10 hostnames in it (and you don’t use string manipulation or other tricks to fabricate hostnames) then you can infer that the IP lookup function only adds 10 new symbols to the sets that your program maintains.
Searching for “herbrand universe” will produce formal writeups of these ideas.
There is one aspect that is infinite, which is along the time axis. But the way the logic works is that you are never in a position to enumerate all possible behaviours. This is a very superficial answer, and perhaps you'll gain more insight from this paper on the logical semantics:
This sounds fascinating. Is Bloom worth spending some time on it with no specific goal, as I did for instance by solving problems from the Euler Project with J and APL some years ago (only for improving my skills and discovering new concepts)? Is there some active community around the language?
And no, there's no active community, but there are a lot of people in the distributed systems world who have been affected by the ideas, judging from back references to that project. For me, it was the importance of monotonic operators (which result in coordination-free design), and the relatively rarer number of places where a non-monotonic operation is required (when coordination/consensus is needed). Also, the centrality of lattices in achieving monotonicity was an eye opener.
"Dedalus: Datalog in Time and Space"
http://db.cs.berkeley.edu/papers/datalog2011-dedalus.pdf
Here is Peter Alvaro's Strange Loop talk -
https://www.youtube.com/watch?v=R2Aa4PivG0g
It was a shame that most people, seemingly the Eve devs as well, seemed to focus on making Eve a language for easy programming for the masses. Thats a huge wall of a problem with a limited funding slope.
I think if there had been more upfront work and focus on the distributed programming/database possibilities, that Eve would have founded an actual community to push it along.
Eve as a language for doing business logic against say Kubernetes? As a language to wire up ETL processes?
I think it could have found a niche that would have made it grow in power before trying to take on the mass market.
Just a thought.