Hacker News new | past | comments | ask | show | jobs | submit | bkitano19's comments login

Related work:

Interpreting Modular Addition in MLPs https://www.lesswrong.com/posts/cbDEjnRheYn38Dpc5/interpreti...

Paper Replication Walkthrough: Reverse-Engineering Modular Addition https://www.neelnanda.io/mechanistic-interpretability/modula...


And more recently, [Language Models Use Trigonometry to Do Addition](https://arxiv.org/abs/2502.00873)

hume.ai specializes in expressive prosody for TTS (disclaimer - I work here)


Time to first token is as important to know for many use cases, rarely are people reporting it



this is nuts


We think so too, big things coming :)


www.juicelabs.co


+1, had the fortune to work with him at a previous startup and meetup in person. Our convo very much broadened my perspective on engineering as a career and a craft, always excited to see what he's working on. Good luck Simon!


https://transformer-circuits.pub/2022/in-context-learning-an...

there is a lot of evidence to suggest that they are performing induction


Had this exact problem (Heroku Postgres to RDS) at my old co. Data migration went as bad as it possibly could (dropped indices, foreign keys, everything but the data itself). This would have saved us months of pain.


https://hootdoogs.com/ - "find food between you guys"


High integrity maintainer, big respect.


Karpathy covers this in Makemore, but the tl;dr is that if you don’t normalize the batch (essentially center and scale your activations down to be normally distributed), then at gradient/backprop time, you may get values that are significantly smaller or greater than 1. This is a problem, because as you stack layers in sequence (passing outputs to inputs), the gradient compounds (because of the Chain Rule), and so what may have been a well behaved gradient at the end layers has either vanished (the upstream gradients were 0<x<1 at each layer) or exploded (the gradients were x>>1 upstream). Batch normalization helps control the vanishing/exploding gradient problem in deep neural nets by normalizing the values passed between layers.


got it,thanks


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: