Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: The BLAKE3 hash function running on a 6502 microprocessor (github.com/oconnor663)
91 points by oconnor663 on Feb 12, 2022 | hide | past | favorite | 21 comments



The computer this runs on is Ben Eater's 6502 breadboard kit, which I highly recommend to anyone with the slightest interest in wires and NAND gates: https://eater.net/6502. After the hello world program was working, I wanted to try something more substantial, so here's the world's slowest implementation of BLAKE3 :) If you have Rust and Cargo installed, you can run `make emulate` to test out the program in your terminal.


Ha, this is neat, I haven't seen the permutation table presented this way before, good stuff.

    permute:
    ; Rather than permuting the 32-bit words stored in the
    ; COMPRESS_MSG array, we permute the 8-bit pointers stored
    ; in the MPTRS array.
    ;
    ; The permutation is defined to be:
    ; 2, 6, 3, 10, 7, 0, 4, 13, 1, 11, 12, 5, 9, 14, 15, 8
    ;
    ; If we walk those assignments, we get two loops:
    ; 0<-2<-3<-10<-12<-9<-11<-5<-0
    ; 1<-6<-4<-7<-13<-14<-15<-8<-1


I'm pleased with that part, but I'm ashamed of my ror12 function :p I'd love it if anyone could suggest a better way to do it.


http://6502.org/source/general/SWN.html swaps the nybbles in A in 8 bytes/12 cycles. That’s not quite what you need, but may give you ideas.


Oh that looks perfect. I'll have to try it out today and report back.


If you want to be on the far end of the speed/space trade off you could simply use a lookup table. That should speed things up considerably. You'd index it by byte (split the tables, save one shift left) and then recombine the results.


Great work! You might like https://github.com/janroesner/sixty5o2 as a lightweight OS for the BE6502, which makes program changes way easier.


Another fun thing might be to compute e to 100,000 digits, like Steve Wozniak did on the Apple II [1].

[1] https://archive.org/details/byte-magazine-1981-06/page/n393/...


I did Blake2s once, also worked. Still dog slow of course.

https://twitter.com/IcePic_dz/status/839548134929858560?s=20...


That's awesome. I'd love to see the source if you have it. The BLAKE2s and BLAKE3 round functions are identical, so there must be a lot in common.


The fact this takes almost 18 minutes on a 6502 but only ~100ms on my phone is incredible. One of those things that makes you realise we have so much computing power it's hard to comprehend.


Where did you get 100 milliseconds? On your phone it should be closer to 1ms.


Running it in Termux with `time` before it. Maybe shuffling data through the bash pipe or the slow read of `head` was a contributor.


Given how the 6502, 8051, and Z80 cores seem to show up in a ton of ultra-low-cost products (running at many times their original clock speeds, and using bank switching to extend the address space into the megabytes range), some even networked with ability to do TLS, the fact that a 6502 can do BLAKE3 isn't so surprising, and IMHO really helps reinforce the notion that crypto == math.

Also somewhat related: https://news.ycombinator.com/item?id=9593412 and https://news.ycombinator.com/item?id=20383561


@oconnor663: (Since you are here a question): In single threaded op mode, which is the recommended choice: Blake2b or Blake3?


I think the biggest difference between them is that BLAKE3 is still quite new. I agree with https://latacora.micro.blog/2018/04/03/cryptographic-right-a... that most cryptographic applications should make "boring" choices, and BLAKE3 is not yet boring. But assuming everything goes smoothly over the next 2-3 years, I'll start recommending BLAKE3 over BLAKE2 in essentially all cases.

If you're curious about the single-threaded performance differences between BLAKE3 and BLAKE2b, I might clarify that the red bar chart at the top of the BLAKE3 readme is actually a single-threaded measurement. Everything depends on your input size though, and we have some more detailed graphs in the paper.


Stockfish on 6502 let's go go go


>CC0

Sure is long and complicated. I wish they'd dual license it as MIT.


Unlicense [0] is the "short and simple" counterpart to CC0. I like the idea of simple language licenses, too, but there are thought to be some issues with these [1].

[0]: https://unlicense.org/ [1]: https://softwareengineering.stackexchange.com/q/147111/10551...


Yes, short and simple is half of why I favor MIT.

The other half is that it is also a very widespread and thus well-understood license that I am not afraid to use.


It is dual licensed with Apache License 2.0 which may be better than CC0 in your case.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: