Essentially all distributed dense linear algebra libraries are built on top of sequential dense linear algebra libraries (mine as well, but on the interface rather than the implementation). Distributed libraries are at least an order of magnitude more complex than their sequential counterparts.
the C++ linear algebra libraries are basically syntactic sugar for interacting with optimized Fortran libraries like BLAS, LAPACK, etc. (I'm sure I'm overlooking some complexity here, because the C++ libraries, while linking to the same Fortran libraries have very different run times)
Yours interacts with ScaLAPACK and other distributed memory libraries.
Why would it not be possible to simply extend say armadillo or eigen to interact with distributed memory libraries? If you need more syntax, then extend the interface.
Elemental implements, as opposed to wraps, the distributed-memory algorithms. In other words, it does not call any library like ScaLAPACK; it is an alternative approach. Elemental builds on top of BLAS/LAPACK/MPI in order to provide a nice interface to dense linear algebra on clusters/supercomputers.
The other major difference is that sequential libraries tend to get away with letting users not have to worry about where data resides. This is of fundamental importance in distributed libraries, and, for this reason, it is usually a bad idea to think of simply modifying sequential APIs.
So why not contribute to (or extend) say ScaLAPACK to make it do what you need and add a wrapped to an existing linear algebra library?
Looking at ScaLAPACK, it's been developed since 1995. I've never touched it, but it's probably many many lines of code (and maybe a few PhD thesis) with all sorts of kinks worked out that will take you decades to iron out yourself. To throw out all that knowledge/work/man-hours and to start from scratch seems like a waste.
Because ScaLAPACK has a cumbersome interface, poor internal design, relies on incorrect premises that affect performance (block cyclic vs. elemental cyclic), and is buggy. In contrast, Elemental is a joy to use and is almost always faster, often by a large margin (see the Elemental paper). Elemental is not a toy project by any means; it is already the basis for a large share of the interesting parallel linear algebra research and is used by many important applications. Jack is currently the best researcher/implementer in the direct linear algebra world. Anyone that knows me knows that I am a critical bastard that does not throw such praise around lightly.
I appreciate the complements, but I disagree with a few of your points:
1. What granularity to distribute the entries in the matrix is a long and subtle argument which doesn't provide a clear winner for every operation (the current conclusions are different for LU with partial pivoting vs. reduction to tridiagonal form). I would by no means say that the approach used by ScaLAPACK is wrong, but only that it is unnecessarily complex and only one operation purposefully targets the finest granularity case.
2. Again, I appreciate the complement, but I don't think that arguments from authority are valid, nor do I think that one can be the "best". I have a large number of colleagues doing wonderful work, much of which I find extremely impressive.
Also, it would be good to disclose that you're affiliated with the project you were promoting.
It is often unhelpful when people who have "never touched" a piece of software comment on it. If you had touched ScaLAPACK or Elemental, you would realize that the algorithmic know-how and performance engineering from ScaLAPACK has been absorbed into Elemental. However, ScaLAPACK was being hampered by the conscious decision to maintain interface compatibility with LAPACK. One of the most valuable things about Elemental is that it can dispense with this antiquated API. Moreover, Elemental has a modern configure and build system, I/O support, and a vibrant development community. I would recommend that you investigate the community resources to satisfy your strong curiosity in this area.
The research behind ScaLAPACK was very worthwhile and led to a huge number of algorithms and insights, and it took me several years of earnest weekend/late-night work to get Elemental to its current state (often drawing from the previous work on ScaLAPACK and PLAPACK). I have referenced a large number of their publications in my source code.
With that said, if you were familiar with the source code and APIs of both libraries, I think that you would see a clear benefit. This is supported by the way that Elemental is growing and arguably has more functionality than ScaLAPACK (with the notable exception being a parallel Schur decomposition). Over the past couple of years, the library has primarily been developed to support my research goals, but a large number of research groups are actively using it now.
The rules of HN state: "Essentially there are two rules here: don't post or upvote crap links, and don't be rude or dumb in comment threads."
I think you have unintentionally violated the last provision. Please don't post about things you have not taken the time to investigate at all, which is true by your own admission: "I've never touched [ScaLAPACK]".