As far as compiling continuations goes, sequent calculus (specifically as a compiler IR) is an interesting research direction. See Grokking the Sequent Calculus (Functional Pearl) from ICFP 2024: https://dl.acm.org/doi/abs/10.1145/3674639
"This becomes clearer with an example: When we want to evaluate the expression (2 + 3) ∗ 5,
we first have to focus on the subexpression 2 + 3 and evaluate it to its result 5. The remainder
of the program, which will run after we have finished the evaluation, can be represented with
the evaluation context □ ∗ 5. We cannot bind an evaluation context like □ ∗ 5 to a variable in
the lambda calculus, but in the λμ~μ-calculus we can bind such evaluation contexts to covariables.
Furthermore, the μ-operator gives direct access to the evaluation context in which the expression
is currently evaluated.
Having such direct access to the evaluation context is not always necessary for a programmer who wants to write an application, but it is often important for compiler implementors who write optimizations to make programs run faster. One solution that compiler writers use to represent evaluation contexts in the lambda calculus is called continuation-passing style. In continuation-passing style, an evaluation context like □ ∗ 5 is represented as a function λ x . x ∗ 5. This solution works, but the resulting types which are used to type a program in this style are arguably hard to understand. Being able to easily inspect these types can be very valuable, especially for intermediate representations, where terms tend to look complex. The promise of the λμ~μ-calculus is to provide the expressive power of programs in continuation-passing style without having to deal with the type-acrobatics that are usually associated with it."
"In some sense the Sequent Calculus (SC) is quite similar to a CPS based IR. To sum up the benefits of SC over CPS in two concepts, I would say: Symmetry and non-opaque continuations.
Symmetry: A lot of pairs of dual concepts are modelled very naturally and symmetrically in SC: call-by-value and call-by-name, data types and codata types, exception based error handling and Result-type error handling, for example.
Non-Opaque continuations: Instead of continuations in CPS which are just a special kind of function whose internals cannot be inspected (easily), SC introduces consumers whose internal structure can be inspected easily, for example when writing optimization passes."
Bingo. The best trials are those that allow the user to determine whether the product is capable of solving the user’s immediate problem without actually solving it unless the product is purchased.
I have multiple system prompts that I use before getting to the actual specification.
1. I use the Socratic Coder[1] system prompt to have a back and forth conversation about the idea, which helps me hone the idea and improve it. This conversation forces me to think about several aspects of the idea and how to implement it.
2. I use the Brainstorm Specification[2] user prompt to turn that conversation into a specification.
3. I use the Brainstorm Critique[3] user prompt to critique that specification and find flaws in it which I might have missed.
4. I use a modified version of the Brainstorm Specification user prompt to refine the specification based on the critique and have a final version of the document, which I can either use on my own or feed to something like Claude Code for context.
Doing those things improved the quality of the code and work spit out by the LLMs I use by a significant amount, but more importantly, it helped me write much better code on my own because I know have something to guide me, while before I used to go blind.
As a bonus, it also helped me decide if an idea was worth it or not; there are times I'm talking with the LLM and it asks me questions I don't feel like answering, which tells me I'm probably not into that idea as much as I initially thought, it was just my ADHD hyper focusing on something.
This isn't quite right. The key thing is that Pony uses the actor model, where an "actor" is an object, a green thread, and an MPSC queue, all bundled together into a single conceptual unit. These MPSC queues are the only synchronization primitive; there aren't mutexes (which means that Pony programs can't internally deadlock, though they can livelock). For this reason, for any given reference (pointer) to a piece of data, you can have any two of mutation, aliasing, and concurrency (i.e., sending the reference to another actor's queue, which doesn't count as mutating the actor). But you can't have all three, because that would allow data races.
Consequently, three "reference capabilities" fall out of this design:
- "iso": allows mutation and concurrency, but not aliasing.
- "val": allows aliasing and concurrency, but not mutation.
- "ref": allows mutation and aliasing, but not concurrency.
The other three are more for generic kinds of programming or to facilitate more complicated tricks:
- "box": only allows aliasing, without mutation or concurrency. Subtype of both val and ref.
- "trn": allows mutation and aliasing, but the aliases are box and so don't themselves allow mutation. Also, you can subsequently change it to either ref or val, to get either mutable aliasing or concurrency (but not both).
- "tag": allows aliasing and concurrency, but not mutation, and (unlike any of the others) also doesn't allow reading the data. The only things you can do are pointer comparisons, and sending something to the referent's queue if the referent is an actor (again, this doesn't count as mutating the actor). Subtype of all the other ones; they all allow tag aliases even if they don't otherwise allow aliasing.
A few years ago David Fitfield invented a technique which provides a million-to-one non-recursive expansion, by overlapping the file streams: https://www.bamsoftware.com/hacks/zipbomb/
Anyone who has long experience with neural networks, LLM or otherwise, is aware that they are best suited to applications where 90% is good enough. In other words, applications where some other system (human or otherwise) will catch the mistakes. This phrase: "It is not entirely clear why this episode occurred..." applies to nearly every LLM (or other neural network) error, which is why it is usually not possible to correct the root cause (although you can train on that specific input and a corrected output).
For some things, like say a grammar correction tool, this is probably fine. For cases where one mistake can erase the benefit of many previous correct responses, and more, no amount of hardware is going to make LLM's the right solution.
Which is fine! No algorithm needs to be the solution to everything, or even most things. But much of people's intuition about "AI" is warped by the (unmerited) claims in that name. Even as LLM's "get better", they won't get much better at this kind of problem, where 90% is not good enough (because one mistake can be very costly), and problems need discoverable root causes.
"Great news, boss! We invented this new tool that allows nontechnical people to write code in English! Now anyone can deploy applications, and we don't have to hire all those expensive developers!"
The article doesn't explicitly state it in this manner in one concise place, but the way I would always think about A* from a "practical/easy-to-remember" perspective back when I was doing competitive programming is that they're all the same algorithm, but with different priorities on the priority queue:
Breadth-first Search: Priority is order of discovery of edges (that is, no priority queue/just a regular queue)
Dijkstra: Priority is distance so far + next edge distance
A*: Priority is distance so far + next edge distance + estimate of distance to target node.
This also helps me remember whether the estimate must over- or under-estimate: Since Dijkstra is making the estimate "0", clearly the "admissible heuristic" criteria must be an under-estimation.
* direct causal contact with the environment, e.g., the light from the pen hits my eye, which induces mental states
* sensory-motor coordination, ie., that the light hits my eye from the pen enables coordination of the movement of the pen with my body
* sensory-motor representations, ie., my sensory motor system is trainable, and trained by historical envirionemntal coordination
* heirachical planning in coordination, ie., these sensory-motor representations are goal-contextualised, so that I can "solve my hunger" in an infinite number of ways (i can achive this goal against an infinite permutation of obstacles)
* counterfactual reality-oriented mental simulation (aka imagination) -- these rich sensory motor representatiosn are reifable in imagination so i can simulate novel permutaitons to the environment, possible shifts to physics, and so on. I can anticipate these infinite number of obsatcles before any have occured, or have ever occured.
* self-modelling feedback loops, ie., that my own process of sensory-motor coordination is an input into that coordination
* abstraction in self-modelling, ie., that i can form cognitive representations of my own goal directed actions as they succeed/fail, and treat them as objects of their own refinement
* abstraction across representation mental faculties into propositional represenations, ie., that when i imagine that "I am writing", the object of my imagination is the very same object as the action "to write" -- so I know that when I recall/imagine/act/reflect/etc. I am operating on the very-same-objects of thought
* facilities of cognition: quantification, causal reasoning, discrete logical reasoning -- etc. which can be applied both at the sensory, motor and abstract conceptual level (ie., i can "count in sensation" a few objects, also with action, also in intellection)
* concept formation: abduction, various various of induction, etc.
* concept composition: recursion, composition in extension of concepts, composition in intension, etc.
One can go on and on here.
Decribe only what happens in a few minutes of the life of a toddler as they play around with some blocks and you have listed, rather trivially, a vast universe of capbilities that an LLM lacks.
To believe an LLM has anything to do with intelligence is to have somewhat quite profoundly mistaken what capabilities are implied by intelligence -- what animals have, some more than others, and a few even more so. To think this has anything to do with linguistic competence is a proudly strange view of the world.
Nature did not produce intelligence in animals in order that they acquire competence in the correct ordering of linguistic tokens. Universities did, to some degree, produce computer science departments for this activity however.
There's a library here that implements a lot of database features and can be used on top of any sorted transactional K/V store, with FoundationDB being an exemplar backend:
It's pretty sophisticated, but the author uses it in his own projects and then just open sources it without trying to build a community around it so you may have to dig in to see that. It gives object mapping, indexing, composite indexing, triggers, query building and so on.
It's not "hard" to implement this stuff per se but it's a lot of work, especially to build enough test coverage to be convincing. I used to be quite into this idea of FoundationDB layers and especially Permazen, which I think is easily one of the best such layers even though it's not well known. I even wrote a SQL binding using Calcite so you could query your Permazen object stores in FDB using SQL!
I will say though, that in the recent past I started a job at Oracle Labs where I ended up using their database in a project, and that kind of gave me a new perspective on all this stuff. For example: scaling. Like a lot of people who spend too much time on Hacker News I used to think Postgres was state of the art, that RDBMS didn't scale well by design, and if you wanted one that did you'd need to use something exotic like layers on a FoundationDB cluster. But no. FoundationDB scales up to a few hundred nodes at most, and Oracle RAC/ExaData clusters can scale up that far too. There are people storing data from particle accelerators in ExaData clusters. The difference is the latter is a full SQL database with all the features you need to build an app right there already, instead of requiring you to depend on questionably well maintained upper layers that are very feature-light.
One place this hits you immediately is joins. Build out an ExaData cluster and you can model your data naturally whilst joining to your heart's content. The DB has lots of ways that it optimizes complex queries e.g. it pushes down predicates to the disk servers, it can read cached data directly out of other node's RAM over RDMA on a dedicated backbone network, and a whole lot more. Nearly every app requires complex queries, so this is a big deal. If you look at FoundationDB layers, then, well:
Now in the last few years FoundationDB added for a very, very simple kind of push-down predicate in which a storage server can dereference a key to form another key, but if you look closely (a) it's actually a layering violation in which the core system understands data formats used by the Record layer specifically so it messes up their nice architecture, (b) the upper layers don't really support it anyway and (c) this is very, very far from the convenience or efficiency of a SQL join.
Another big problem I found with modeling real apps was the five second transaction timeout. This is not, as you might expect, a configurable value. It's hard-coded into the servers and clients. This turns into a hugely awkward limitation and routinely wrecks your application logic and forces you to implement very tricky concurrency algorithms inside your app, just to do basic tasks. For example, computing most reports over a large dataset does not work with FoundationDB because you can't get a consistent snapshot for more than five seconds! There are size limits on writes too. When I talked to the Permazen author about how he handled this, he told me he dumps his production database into an offline MySQL in order to do analytics queries. Well. This did cool my ardour for the idea somewhat.
There are nonetheless two big differences or advantages to FoundationDB. One is that Apple has generously made it open source, so it's free. If you're the kind of guy who is willing to self-support a self-hosted KV storage cluster without any backing from the team that makes it, this is a cost advantage. Most people aren't though so this is actually a downside because there's no company that will sell you a support contract, and your database is the core of the business so you don't want to take risks there usually. The second is it supports fully serializable transactions within that five second window, which Oracle doesn't. I used to think this was a killer advantage, and I still do love the simplicity of strict serializability, but the five second window largely kills off most of the benefits because the moment you even run the risk of going beyond it, you have to break up your transactions and lose all atomicity. It also requires care to achieve full idempotency. Regular read committed or snapshot isolation transactions offer a lower consistency level, but they can last as long as you need, don't require looping and in practice that's often easier to work with.
I see a lot of threads pitting models against each other (or whole swarms of them) in the hope that "wisdom of crowds" will magically appear. After a stack of experiments of my own—and after watching the recent ASU/Microsoft-Research work [1].. I've landed on a simpler takeaway:
An LLM is a terrible verifier of another LLM.
Subbarao Kambhampati's "(How) Do LLMs Reason/Plan?" talk shows GPT-4 confidently producing provably wrong graph-coloring proofs until a symbolic SAT solver is introduced as the referee [1]. Stechly et al. quantify the problem: letting GPT-4 critique its own answers *reduces* accuracy, whereas adding an external, sound verifier boosts it by ~30 pp across planning and puzzle tasks [2]. In other words, verification is *harder* than generation for today's autoregressive models, so you need a checker that actually reasons about the world (compiler, linter, SAT solver, ground-truth dataset, etc.).
Because of that asymmetry, stacking multiple LLMs rarely helps. The "LLM-Modulo" position paper argues that auto-regressive models simply can't do self-verification or long-horizon planning on their own and should instead be treated as high-recall idea generators wrapped by a single, sound verifier [3]. In my tests, replacing a five-model "debate" with one strong model + verifier gives equal or better answers with far less latency and orchestration overhead.
You can start right now with an algorithm I learned from an expert when I was working in a landscaping business.
It is a very simple three-pass plan: "Deadwood, Crossovers, Aesthetics".
So, first pass, go through the tree cutting out only and all the dead branches. Cut back to live stock, and as always make good clean angle cuts at a proper angle (many horticulture books will provide far better instructions on this).
Second pass, look only for branches that cross-over other branches and especially those that show rubbing or friction marks against other branches. Cut the ones that are either least healthy or grow in the craziest direction (i.e., crazy away from the normal more-or-less not radially away from the trunk).
Then, and only after the other two passes are complete, start pruning for the desired look and/or size & shape for planned growth or bearing fruit.
This method is simple and saves a LOT of ruined trees from trying to first cut to size and appearance, then by the time the deadwood and crossovers are taken later, it is a scraggly mess that takes years to grow back. And it even works well for novices, as long as they pay attention.
I'd suspect entering the state and direction of every branch to an app would take longer than just pruning with the above method, although for trees that haven't fully leafed out, perhaps a 360° angle set of drone pics could make an adequate 3D model to use for planning?
In any case, good luck with your fruit trees — may they grow healthy and provide you with great bounty for many years!
We have apps for basically every platform. Our PWA even supports IE 11!
You can use the WP1 tool which I'm the primary maintainer of (https://wp1.openzim.org/#/selections/user) to create "selections" which let you have your own custom version of Wikipedia, using categories that you define, WikiProjects, or even custom SPARQL queries.
I was feeling a bit like the Petunia and thought "Oh no, not again." :-) One of the annoyances of embedded programming can be having the wheel re-invented a zillion times. I was pleased to see that the author was just describing good software architecture that creates portable code on top of an environment specific library.
For doing 'bare metal' embedded work in C you need the crt0 which is the weirdly named C startup code that satisfies the assumption the C compiler made when it compiled your code. And a set of primitives to do what the i/o drivers of an operating system would have been doing for you. And voila, your C program runs on 'bare metal.'
Another good topic associated with this is setting up hooks to make STDIN and STDOUT work for your particular setup, so that when you type printf() it just automagically works.
This will also then introduce you to the concept of a basic input/output system or BIOS which exports those primitives. Then you can take that code in flash/eprom and load a binary compilation into memory and start it and now you've got a monitor or a primitive one application at a time OS like CP/M or DOS.
Its a fun road for students who really want to understand computer systems to go down.
This document mentions attribution requirements, doesn't touch on the questions I'm most interested in with respect to geocoding APIs:
- Can I store the latitude/longitude points I get back from the API in my own database forever, and use them for things like point-in-polygon or point-closest-to queries?
- Can I "resyndicate" those latitude/longitude points in my own APIs?
I've encountered quite a few popular geocoding APIs (including Google's) that disallow both of these if you take the time to read the small print. This massively limits how useful they are: you can build a "show me information about this location right now" feature but if you have a database of thousands of addresses you can't meaningfully annotate data in a way that's valuable in the medium-to-long-term.
The API thing matters because it's not great having location data in your database that you can't expose in an API to other partners!
I find that these conversations on HN end up covering similar positions constantly.
I believe that most positions are resolved if
1) you accept that these are fundamentally narrative tools. They build stories, In whatever style you wish. Stories of code, stories of project reports. Stories or conversations.
2) this is balanced by the idea that the core of everything in our shared information economy is Verification.
The reason experts get use out of these tools, is because they can verify when the output is close enough to be indistinguishable from expert effort.
Domain experts also do another level of verification (hopefully) which is to check if the generated content computes correctly as a result - based on their mental model of their domain.
I would predict that that LLMs are deadly in the hands of people who can’t gauge the output, and will end up driving themselves off of a cliff, while experts will be able to use it effectively on tasks where verification of the output has a comparative effort advantage, over the task of creating the output.
I can beat this. My wife's maiden name was in the form "Jane Angela Smith". When we got married, she changed it to "Jane Smith Jones", first name Jane, middle name Smith, last name Jones. Someone at the Social Security Administration entered it into their database as first name "Jane", no middle name, last name "Smith Jones".
Now, for fun, no one noticed this for about 25 years. Her Social Security card says "Jane Smith Jones". Her driver's license says "Jones, Jane Smith". Her US passport says "Jones, Jane Smith". But another part of the federal government says "Smith Jones, Jane". We only found this out when she tried to renew her driver's license recently and the clerk was like, "hey, this isn't matching up right...". A month later, the TSA clerk at the airport stopped her to ask why her passport didn't match her federal records.
So now we're paying $400 to legally change her name from "Jane Smith Jones" to "Jane Smith Jones". That's what the notice they make you pay to run in the newspaper says, anyway.
PEGs are extremely difficult to debug when things go wrong because you get a MASSIVE trace (if you're lucky) to sift through. This tool has rewind/playback controls to dig into what the PEG rules are doing.
Edit: and now having read through comments here, Chevrotain's PEG tool looks really impressive, although it doesn't use a simple PEG syntax for the input: https://chevrotain.io/playground/
Whenever I find some mythical free time I'd like to add a video about Constraint Satisfaction, but in the mean time here's my full lecture on CSP: https://www.youtube.com/watch?v=w5tPsbOvkmU
When we think about Explainable AI, we're looking for a "reason the AI made a decision". LLMs are using text prediction to make their decisions, but more traditional AIs use a different heuristic for their selection process. Many of these methodologies are still in use today and get the job done quite well. The "explanation" for these AIs is a (hopefully) intuitive method inspired by other natural phenomena that occurs in the world (like evolution, blacksmithing, literally how ants move).
If however you're looking for AI to explain the "why" element, I will admit that is where the traditional algorithms struggle, but I would add that LLMs aren't much more informative. Its reflection is going to be by guessing what the next right word to say is, which isn't any better or worse than the other models in terms of explainability.
In the past I used the Java SecurityManager to do a PoC PDF renderer that sandboxed the Apache PDFbox library. I lowered the privilege of the PDF rendering component and tested it with a PDF that exploited an old XXE vulnerability in the library. It worked! Unfortunately, figuring out how to do that wasn't easy and the Java maintainers got tired of maintaining something that wasn't widely used, so they ripped it out.
There are some conceptual and social difficulties with this kind of security, at some point I should write about them.
1. People get distracted by the idea of pure capability systems. See the thread above. Pure caps don't work, it's been tried and is an academic concept that nothing real uses but which sucks up energy and mental bandwidth.
2. People get distracted by speculation attacks. In process sandboxing can't stop sandboxed code from reading the address space, so this is often taken as meaning everything has to be multi-process (like Mojo). But that's not the case. Speculation attacks are read only attacks. To do anything useful you have to be able to exfiltrate data. A lot of libraries, if properly restricted, have no way to exfiltrate any data they speculatively access. So in-process sandboxing can still be highly useful even in the presence of Spectre. At the same time, some libraries DO need to be put into a separate address space, and it may even change depending on how it's used. Mojo is neat partly because it abstracts location, allowing components to run either inproc or outproc and the calling code doesn't know.
3. People get distracted by language neutrality. Once you add IPC you need RPC, and once you add RPC it's easy to end up making something that's fully language neutral so it turns into a fairly complex IDL driven system. The SecurityManager was nice because it didn't have this problem.
4. Kernel sandboxing is an absolute shitshow on every OS. Either the APIs are atrocious and impossible to figure out, or they're undocumented, or both, and none of them provide sandboxing at the right level e.g. you can't allow a process to make HTTP requests to specific hosts because the kernel doesn't know anything about HTTP.
Sandboxing libraries effectively is still IMHO an open research problem. The usability problems need someone to crack it, and it'll probably be language/runtime specific when they do even if libraries like Mojo are sitting there under the hood.
So much for "but deepseek doesn't do multi-modal..." as a defence of the alleged moats of western AI companies.
How ever many modalities do end up being incorporated however, does not change the horizon of this technology which has progressed only by increasing data volume and variety -- widening the solution class (per problem), rather than the problem class itself.
There is still no mechanism in GenAI that enforces deductive constraints (and compositionality), ie., situations where when one output (, input) is obtained the search space for future outputs is necessarily constrained (and where such constraints compose). Yet all the sales pitches about the future of AI require not merely encoding reliable logical relationships of this kind, but causal and intentional ones: ones where hypothetical necessary relationships can be imposed and then suspended; ones where such hypotheticals are given a ordering based on preference/desires; ones where the actions available to the machine, in conjunction with the state of its environment, lead to such hypothetical evaluations.
An "AI Agent" replacing an employee requires intentional behaviour: the AI must act according to business goals, act reliably using causal knowledge of the environment, reason deductively over such knowledge, and formulate provisional beliefs probabilistically. However there has been no progress on these fronts.
I am still unclear on what the sales pitch is supposed to be for stochastic AI, as far as big business goes or the kinds of mass investment we see. I buy a 70s-style pitch for the word processor ("edit without scissors and glue"), but not a 60s-style pitch for the elimination of any particular job.
The spend on the field at the moment seems predicated on "better generated images" and "better generated text" somehow leading to "an agent which reasons from goals to actions, simulates hypothetical consequences, acts according to causal and environmental constraints.. " and so on. With relatively weak assumptions one can show the latter class of problem is not in the former, and no amount of data solving the former counts as a solution to the latter.
The vast majority of our work is already automated to the point where most non-manual workers are paid for the formulation of problems (with people), social alignment in their solutions, ownership of decision-making / risk, action under risk, and so on.
The core of the argument is that if accuracy and reliability matter and if verification is hard, then LLMs are inappropriate.
If accuracy and reliability don't matter, go for it.
If they do, but you can easily verify the output, then go for it.
So, yes. The core argument would apply to making inquiries about any ancient religious texts especially if you aren't already able to read the generated text (it's in a language you aren't fluent in) or are unable to access its claimed references to verify what it says they say.
Like you, I was never able to get into Zork. I really enjoyed it, until I inevitably became stuck. Hearing you say this could be permanent really seals it. It's one of those games that's more fun to recollect than to actually play.
FOSS isn't a business model. It never was and never will be.
What Free Software always was is an ethical movement—one which didn't need to prioritize income streams because the point wasn't sustainable development, it was user freedom. Nowhere in "users should have the freedom to do what they want with software" does it say "and we should be able to pay a few developers a salary for their work towards that fundamentally ethical goal". Under the original paradigm and goals, any income streams are just cream on top of doing the right thing.
According to the OSI's history of itself [0], at some point people got it into their heads that the open development model was inherently a good one for business, too—Netscape jumped on board, and then some people got together and decided to rebrand Free Software:
> The conferees decided it was time to dump the moralizing and confrontational attitude that had been associated with "free software" in the past and sell the idea strictly on the same pragmatic, business-case grounds that had motivated Netscape. They brainstormed
about tactics and a new label. "Open source", contributed by Chris
Peterson, was the best thing they came up with.
The word FOSS reminds me a lot of American corporate Buddhism—mindfulness and meditation totally removed from its original deeply religious context and turned into some sort of self-help program, with the result being something that would be barely recognizable to the original practitioners. Free Software was never about sustainable development. It was about doing the right thing—enabling user freedom—because it is right. Everything else was just means to that end, but at some point along the line the means became the end and we started wondering why FOSS wasn't paying the bills like it was supposed to.
"This becomes clearer with an example: When we want to evaluate the expression (2 + 3) ∗ 5, we first have to focus on the subexpression 2 + 3 and evaluate it to its result 5. The remainder of the program, which will run after we have finished the evaluation, can be represented with the evaluation context □ ∗ 5. We cannot bind an evaluation context like □ ∗ 5 to a variable in the lambda calculus, but in the λμ~μ-calculus we can bind such evaluation contexts to covariables. Furthermore, the μ-operator gives direct access to the evaluation context in which the expression is currently evaluated.
Having such direct access to the evaluation context is not always necessary for a programmer who wants to write an application, but it is often important for compiler implementors who write optimizations to make programs run faster. One solution that compiler writers use to represent evaluation contexts in the lambda calculus is called continuation-passing style. In continuation-passing style, an evaluation context like □ ∗ 5 is represented as a function λ x . x ∗ 5. This solution works, but the resulting types which are used to type a program in this style are arguably hard to understand. Being able to easily inspect these types can be very valuable, especially for intermediate representations, where terms tend to look complex. The promise of the λμ~μ-calculus is to provide the expressive power of programs in continuation-passing style without having to deal with the type-acrobatics that are usually associated with it."
A brief summary of the SC-as-a-compiler-IR vs. CPS from one of the authors of the paper: https://types.pl/@davidb/115186178751455630
"In some sense the Sequent Calculus (SC) is quite similar to a CPS based IR. To sum up the benefits of SC over CPS in two concepts, I would say: Symmetry and non-opaque continuations.
Symmetry: A lot of pairs of dual concepts are modelled very naturally and symmetrically in SC: call-by-value and call-by-name, data types and codata types, exception based error handling and Result-type error handling, for example.
Non-Opaque continuations: Instead of continuations in CPS which are just a special kind of function whose internals cannot be inspected (easily), SC introduces consumers whose internal structure can be inspected easily, for example when writing optimization passes."