Hacker News new | past | comments | ask | show | jobs | submit login

When I worked with Halide, on a thing that is relatively close to the image decoding (transform a series of angle-amplitude pairs (radioastronomy) to the image, a lot of sprite-with-opaqueness painting), I was perplexed to find out how hard it is to work in Halide with non-constant offsets. The functionality was essentially non-existent back then (2015, I assume).

In fact, any scatter-gather operations were non-existent in Halide.

From what I remember these operations were introduced at some point, but we moved away from Halide.

Also, it was not quite simple to transform a loop that draws sprites over the entire image into a loop that draw sprites over part of image and draws many parts of image in parallel (change nesting). Hand-written CUDA version of the algorithm ended up with exactly that.

Thus, if you need some partial-derivatives-numerical-kernel, Halide is good for you. If you are working on the video decoding, Halide is not that good for you. If you are working on video encoding, Halide will be more of a nuisance than a helping hand (early exits from loops, computable access ranges, etc).




That is all very interesting. If you were drawing sprites, why not use straight openGL?

Do you think it would have worked to organize tiles and threads outside of halide and use halide for isolated parts that are already organized into arrays?


We would like to be relatively target-agnostic. OpenGL could be one of targets, but not only one. We also would like to work on regular and/or GPU-equipped cluster machines, etc.

On the suggestion in your second part: why use Halide then? Should it be responsibility of Halide to work out the best loop nesting and best use of threads?

Again, Halide was put aside and we used CUDA for final version, exactly because of inability of Halide to do good work in our case.


> Should it be responsibility of Halide to work out the best loop nesting and best use of threads?

I don't know about "should", but it seems to me that it would still be valuable, even if working out the threading and organization.of the data into an array.

> Again, Halide was put aside and we used CUDA for final version, exactly because of inability of Halide to do good work in our case

I didn't say anything about that. I'm not sure why you are restating it.


I suppose Halide was mainly involved for this part:

> transform a series of angle-amplitude pairs (radioastronomy) to the image




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: