Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, many angles on the alignment problem can be studied now, and have started making good progress recently. Some things will turn out in retrospect to not have been relevant, due to architectural shifts, but not everything. Some things are specific to LLMs; some things are specific to transformers but not to language-model transformers; some things are conceptual and likely to still apply to quite-different systems; and some things are just field-building and not specific to any architecture at all.

Eg in mechanistic interpretability, there are a lot of findings on LLMs that turn out to generalize across a wider set of NN architectures. Eg https://transformer-circuits.pub/2022/solu/index.html is something that couldn't be done without access to LLMs, but which looks likely to generalize into future architectures.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: