I'm thinking of a similar project myself and I'm curious what consideration besides loop vectorization goes into such stuff, especially, what about caches and access issues (OK, I ask the same question for any project like this)?
Also, isn't one factor in sparse representations that if you aren't careful, the data becomes un-sparse and slows down a lot?
A key idea of this language is that you _don't_ give up control over your data structures. You program your algorithm against abstract data structures, as in most high-performance DSLs, but then you _explicitly control_ the implementation of those abstract data structures as physical hierarchical structures. (You can get a taste for this in the `layout` block in Fig. 1.)
Am I right this language compiles to GPU code?
I'm thinking of a similar project myself and I'm curious what consideration besides loop vectorization goes into such stuff, especially, what about caches and access issues (OK, I ask the same question for any project like this)?
Also, isn't one factor in sparse representations that if you aren't careful, the data becomes un-sparse and slows down a lot?