I believe that Daniel Huttenlocher and Pedro Felzenszwalb should be credited for the multi-pass (first X, then Y) transform based on quadratic distance:
That second paper from 1996 references an even older paper from 94, saying “Dividing rows and columns alternately, Chen and Chuang reduced the time complexity to O(N^2) which is optimal.”
https://ecommons.cornell.edu/handle/1813/5663