Hacker News new | past | comments | ask | show | jobs | submit | stoptrlling's comments login

Anyone knows the computational cost of training with these LoRa designs? Given that we are talking about rates of token per seconds, it seems training a bigger dataset could be extremely expensive


The adapter and LoRa have a drastically fewer parameters, so one might expect that forward + backward is roughly 2x the cost of forward.

Then (as far as I know), in contrast to generation, training is done on the entire output of the transformer (so all tokens of the full input) rather than serially token-by-token (in the RNN days, this was called teacher-forcing), so that may give you a significant boost in the tokens per second rate over generation.


Looks like Germany is achieving degrowth faster than their targets


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: