Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Floating format 1:1:0:1's 8 possible values:

    000: +0
    001: +1 ( denormal: (-1)^0 * 0.5 * 2^(-0+1) )
    010: +inf
    011: +qnan
    100: -0
    101: -1 ( denormal: (-1)^1 * 0.5 * 2^(-0+1) )
    110: -inf
    111: -qnan
=== Floating point crib sheet ===

--- Format ---

Sign:exponent:stored explicit mantissa leading bit:mantissa fraction:

       binary16 = 1:5:0:10
       bfloat16 = 1:8:0:7
    TensorFloat = 1:8:0:10
           fp24 = 1:7:0:16
       binary32 = 1:8:0:23 
       binary64 = 1:11:0:52
           8087 = 1:11:1:67
      binary128 = 1:15:0:112
--- Interpretation ---

leading bit = (exponent != 0) ? 1 : 0 when implicit (not stored)

bias = 2^(exponent bits - 1) - 1

value = (-1)^sign * 0 when zero

value = (-1)^sign * {{leading bit}}.{{mantissa fraction}}b * 2^(exponent - bias) when normal

value = (-1)^sign * 0.{{mantissa fraction}}b * 2^(-bias+1) when denormal

--- Classification ---

zero = exponent == 0 && mantissa fraction == 0

denormal = exponent == 0 && mantissa fraction != 0

normal = exponent != 0 && exponent != ~0

inf = exponent == ~0 && mantissa fraction == 0

nan = exponent == ~0 && mantissa fraction != 0

snan = nan && msb(mantissa fraction) == 0

qnan = nan && msb(mantissa fraction) == 1

PS: It often takes fewer gates to implement a simpler microcoded microarchitecture than to implement a single hardwired macroarchitecture. Microcoded architectures are theoretically slower than hardwired but this is often not the case in reality because of the costs of gate fanout and extra gates for clock distribution that ameliorate gains of fully specified and decoded in hardware.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: