>I have a similar conjecture for glidesort but I'll have to do some thinking to see if I can prove it.
It probably is technically O(n (log n)^2) moves, but I suspect that when the rotations are fast enough (trinity / bridge rotations) the cost saving from localization nullify the additional moves.
>I do cite you in the presentation in the section on ping-pong merges
Wikisort does have prior claim to ping-pong merges, and it likely goes further back than that. I did independently implement them in quadsort.
>I do also cite you for what you call parity merges (something I did not use at the time of the presentation
This feels like a tricky claim to me, since I assume your bidirectional merge was fully inspired by my parity merge. If so you weren't the first to implement one, I did so while working on the parity merge, but I figured it was self-evident the 2x gain would apply for a more traditional merge, and I did state the gain was from accessing two memory regions in the quadsort article.
I've since made a quadsort release that uses a galloping 'bidirectional merge', though it still takes advantage of the parity principle, so properly speaking it's a bidirectional galloping parity merge hybrid that sorts 2 blocks of 8 elements at a time, without boundary checks for the 16 merge operations.
>As you could notice, I didn't have a large time slot for my presentation
I did take that into account, I guess it's human nature to get butthurt over stuff like this and I couldn't help myself from whining a little. It is better than bottling it up and I apologize if I caused you unease in turn.
>Hopefully this does show my good intent
It does, and I appreciate it.
>That said, glidesort does not use a branchless binary search.
I assume that's one of the advantages of building upon powersort? The blocky nature of quadsort is both a blessing and a curse in that regard.
>I call my partitioning method bidirectional partitioning.
I meant "parity merge" / "partition". If you think about it quicksort is naturally bidirectional, which is why branchless partitioning works without further work, and undoubtedly why it wasn't self-evident to most. I did try to write a bidirectional partition early on, but it's either my hardware or I messed up somehow.
I agree it's possible to write a bidirectional merge without utilizing the parity principle. One of the trickiest things in programming is properly naming things.
As for instruction-level parallelism, I do think it's actually almost entirely memory-level parallelism. I could be wrong though.
A nitpick, but my name is Orson Peters, Peters is my last name :)
> I assume that's one of the advantages of building upon powersort? The blocky nature of quadsort is both a blessing and a curse in that regard.
I don't know, I only use a binary search when splitting up merges, and almost no time is spent in this routine. I use this for the low-memory case, as well as to create more parallelism to use for instruction-level parallelism.
> As for instruction-level parallelism, I do think it's actually almost entirely memory-level parallelism. I could be wrong though.
I didn't do specific research into which effect it is, when I say ILP I also mean the memory effects of that.
I actually knew that, not sure what went wrong in my brain.
>I don't know, I only use a binary search when splitting up merges, and almost no time is spent in this routine.
You'll still get a definite speed-up, worth benching the difference just in case. I guess there's no easy way to avoid a binary search.
>I didn't do specific research into which effect it is, when I say ILP I also mean the memory effects of that.
They're indeed related. I did some empirical testing and I'm quite sure it's cache related. Thinking about it, one issue I may have had was not benching sorting 100M elements, which might be where a bidirectional partition might benefit on my system.
>I have a similar conjecture for glidesort but I'll have to do some thinking to see if I can prove it.
It probably is technically O(n (log n)^2) moves, but I suspect that when the rotations are fast enough (trinity / bridge rotations) the cost saving from localization nullify the additional moves.
>I do cite you in the presentation in the section on ping-pong merges
Wikisort does have prior claim to ping-pong merges, and it likely goes further back than that. I did independently implement them in quadsort.
>I do also cite you for what you call parity merges (something I did not use at the time of the presentation
This feels like a tricky claim to me, since I assume your bidirectional merge was fully inspired by my parity merge. If so you weren't the first to implement one, I did so while working on the parity merge, but I figured it was self-evident the 2x gain would apply for a more traditional merge, and I did state the gain was from accessing two memory regions in the quadsort article.
I've since made a quadsort release that uses a galloping 'bidirectional merge', though it still takes advantage of the parity principle, so properly speaking it's a bidirectional galloping parity merge hybrid that sorts 2 blocks of 8 elements at a time, without boundary checks for the 16 merge operations.
>As you could notice, I didn't have a large time slot for my presentation
I did take that into account, I guess it's human nature to get butthurt over stuff like this and I couldn't help myself from whining a little. It is better than bottling it up and I apologize if I caused you unease in turn.
>Hopefully this does show my good intent
It does, and I appreciate it.
>That said, glidesort does not use a branchless binary search.
I assume that's one of the advantages of building upon powersort? The blocky nature of quadsort is both a blessing and a curse in that regard.
>I call my partitioning method bidirectional partitioning.
I meant "parity merge" / "partition". If you think about it quicksort is naturally bidirectional, which is why branchless partitioning works without further work, and undoubtedly why it wasn't self-evident to most. I did try to write a bidirectional partition early on, but it's either my hardware or I messed up somehow.
I agree it's possible to write a bidirectional merge without utilizing the parity principle. One of the trickiest things in programming is properly naming things.
As for instruction-level parallelism, I do think it's actually almost entirely memory-level parallelism. I could be wrong though.