Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>Square roots are implemented in hardware

But how does that hardware implementation work internally?

The point I'm trying to make is that it is probably an (in-hardware) loop that uses Newton's Method.

ETA: The point being that, although in the source code it looks like all looks have been eliminated, they really haven't been if you dig deeper.



> how does that hardware implementation work internally?

I don’t know, but based on performance difference between FP32 and FP64 square root instructions, the implementation probably produces 4-5 bits of mantissa per cycle.


There are other methods used in hardware, eg (for example)

https://en.wikipedia.org/wiki/Methods_of_computing_square_ro...

Something like Heron's method is a special case of Newton's method.


Interesting that your linked algorithm manages to avoid costly divisions, but it uses an even longer loop than Newton's Method -- one iteration for every 2 bits. NM converges quadratically, doubling the number of correct bits each time, so a 32-bit number won't ever need more than 5 iterations, assuming >= 1 bit of accuracy in the initial estimate, which is easy to obtain just by shifting right by half the offset of the highest 1-bit.


There are trade-offs (constant time, perhaps?) and many differing applications ...

For example: Pipelined RISC DSP chips have fat (many parallel streams) "one FFT result per clock cycle" pipelines that are rock solid (no cache hits or jitter).

The setup takes a few cyces but once primed it's

aquired data -> ( pipeline ) -> processed data

every clock cycle (with a pipeline delay, of course).

In that domain hardware implementations are chosen to work well with vector calculations and with consistent capped timings.

( To be clear, I haven't looked into that specific linked algo, I'm just pointing out it's not a N.R. only world )




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: