Hacker News new | past | comments | ask | show | jobs | submit | LightMachine's comments login

The search space I'm using is that of all functions of a given dependent type. That allows you to make the search space by using a strong enough type.

For example, if you search for `Integer -> Integer -> Integer` function, it will consider Integers of different bit-sizes. But if you instead search for `∀(n: Nat). Int(n) -> Int(n) -> Int(n)`, you will only consider integers of the same bit-size, which is a much smaller space. You can make arbitrary restrictions to shorten your search.


So you're enumerating a search space of functions, with some constraints to keep the search tractable?

When you do this kind of thing, there are two worst cases:

a) Your constraints are too strong and the search space does not include the target.

b) The search space includes the target but it is too large to search in polynomial time.

How are you dealing with those?

To clarify, the happy case is when your search target is easy to find, i.e. when the search space is small and includes the target. But that happens... rarely (because program search spaces are large).


Uhm author here. Not sure why this tweet is on Hacker News, as it is just a non-technical "blog post". But I've posted a follow-up today with some code and details, if you're curious:

https://x.com/VictorTaelin/status/1819774880130158663

That's as much as I can share for now though


Only thing I could read (don't understand any of this otherwise) was that you take input pairs and you give python function to generate that output. Does that mean that many math etc problems can be just solved? What kind of python code will it generate to return primes?


It will just return the smallest function that passes your tests.

It works by enumerating ALL possible functions and running them. Obviously, that naive approach is exponential, so, the entire point is whether we can apply some clever tricks (based on optimal evaluators) to make this search "slightly less intractable".

So, to answer your question directly: if you asked it to design a prime number generator, if it found anything at all, it would probably be a simple, slow trial-and-error algorithm, or something in these lines.


I imagine it like content addressable computation which implicitly means data too.

Why compute anything more than once? Find by identity and reuse.

This then opens up interesting things when applied in a network that can optimize pathing (like an internet, DHT-like overlay).


Hi LightMachine. I can't read Haskel (?). In this screenshot:

https://x.com/VictorTaelin/status/1819208143638831404/photo/...

What's the highlighted program (?) on line 1875997?


This is on CPU vs GPU.

A GPU core (shading unit) is 100x weaker than a CPU core, thus the difference.

ON the GPU, HVM's performance scales almost 16000x with 16000x cores. Thus the "near ideal speedup".

Not everyone knows how GPUs work, so we should have been more clear about that!


I really think I take criticism well... The problem is that people were criticizing us for not doing things that were literally done on the second paragraph. So at this point it didn't feel like productive criticism? That's like being criticized for being naked when you're full clothed. How do you even make sense of that...


People are more childish than they like to believe. It's a mix of jealousy and ignorant skepticism. What you're doing is incredibly interesting I look forward to seeing it develop!


The fact that people compile summaries of discussions from HN comments like they've extracted gold shows the level of douchebaggery.


You are doing superb. Just remind your self there are people that think Elon is incompetent despite TESLA and SpaceX.


I have no idea what you're trying to convey, but I'm Victor Taelin. Also very cool comment on that thread, hypothesizing on whether we'd be able to ever run it on GPUs. We did it! That is what we're announcing today.


Great, thanks for clarifying.


Thanks for the feedback. Some corrections:

We do use multi-level caching, and you can achieve 5x higher performance by using it correctly. FFI is already implemented, just not published, because we want to release it with graphics rendering, which I think will be really cool. Haskell/GHC uses a graph and trees too, and nobody would say it is not practical of useful. And while it is true that arrays are king, there are many SOTA algorithms that are implemented in Haskell (including compilers, type-checkers, solvers) because they do not map well to arrays at all.

The main reason ICs are not fast is that nobody ever has done low-level optimization work over it. All previous implementations were terribly inefficient. And my own work is too, because I spent all time so far trying to get it to run *correctly* on GPUs, which was very hard. As you said yourself, there aren't even loops yet. So, how can we solve that? By adding the damn loops! Or do you think there is some inherent limitation preventing us to do that? If you do, you'll be surprised.

HVM2 is finally a correct algorithm that scales. Now we'll optimize it for the actual low-level performance.


> HVM2 is finally a correct algorithm that scales.

This, I think, is the key thing people are missing.

Maybe your low level performance will never be as good as hoped, but for this sort of task, "the parallelisation part works and produces correct results" might not be sufficient but is absolutely necessary, and any optimisation work done before that has such a high probability of having to be thrown away that under similar circumstances I wouldn't bother in advance either.


You're comparing CPU cores to GPU cores!

It is "only" 50x because a single GPU core is 100x weaker than a CPU core!

Within CUDA cores, it is actually a linear speedup! It does 2k MIPS with 1 CUDA core, and ~28000 MIPS with 16k CUDA cores. If we double the performance of single-core GPU evaluation, we almost double the performance with 16k cores!


2k MIPS is 2 GIPS, no?

Great work BTW


So use a metric that makes absolutely no sense on given domain, instead of one that is completely correct, sensible, accurate, stablished on the literature, and vastly superior in context? What even is a FLOPS in the context of Interaction Net evaluation? These things aren't even interchangeable.


The fact that you don’t know the answer to this question, and don’t even seem to think it is relevant, is chilling.

People want to be able to ground your work—which you are claiming is the “parallel future of computation”—in something familiar. Insulting them and telling them their concerns are irrelevant just isn’t going to work.

I would urge you to think about what a standard comparison versus Haskell would look like. Presumably it would be something that dealt with a large state space, but also top down computation (something you couldn’t easily do with matrices). Big examples might include simply taking a giant Haskell benchmark (given the setting of inets it seems like a natural fit) that is implemented in a fairly optimal way—-both algorithmically and also wrt performance—-and compare directly on large inputs.

Sorry to trash on you here, not trying to come across as insulting, but I agree that “reductions per second” is meaningless without a nuanced understanding of the potentially massive encoding blowup that compilation introduces.

We want to believe, but the claims here are big


I really appreciate the feedback, but the claim is that the performance scales linearly with cores, and it does. Also that it runs on GPUs, and it does. Yet, asking what is its "floating point operations per second" is nonsense, because it is not doing floating point computations. It is doing interactions. Thus the "interactions per second" term, which I didn't invent, it is the term used on the domain's literature.

I truly want to help here, but that is like asking us to tell you how many gallops per second hour car does. It just makes no sense in context. If I did invent some conversion, I would be lying, and that would be much worse than using a non-familiar term. The way to compare across languages is to benchmark and report on time. Which is like "horsepower" in that sense, as it applies to both domains.


You're wrong. The Haskell code is compiled to a loop, which we didn't optimize for yet. I've edited the README to use the Bitonic Sort instead, on which allocations are unavoidable. Past N=20, HVM2 performs 4x faster than GHC -O2.


What? I ran your example, from your readme, where you promise a massive performance improvement, and you're accusing me of doing something wrong?

This is exactly what a scammer would say.

I guess that's the point here. Scam people who don't know anything about parallel computing by never comparing against any other method?


Thanks for the feedback! Some clarifications:

1. I didn't accuse you of doing something wrong, just that your claim was wrong! It has been proven that Interaction Combinators are an optimal model of concurrent computation. I also pointed cases where it also achieves practical efficiency, over-performing GHC's highest optimization level.

2. The performance scaling claimed been indeed been achieved, and the code is open for anyone to replicate our results. The machines used are listed on the repository and paper. If you find any trouble replicating, please let me know!

3. We're not selling any product. Bend is Apache-licensed.


The insult against anyone who pushes back a little bit is not a good sign, I agree. From all we can see now, the massive speedups being claimed have zero optimal baselines. I badly would like to identify these.


I apologize, I gave you the wrong answer.

I thought you was talking about the DEMO example, which ran ~30% slower than expected. Instead, you were talking about the README, which was actually incorrect. I noticed the error and edited it. I explained the issue in another comment.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: