empirical interview paper with hackers, seems while there's a lot of "throwing ML over hacking" there's actually not much research into how hackers actually work..
Vision paper that connects gpt-3.5-turbo with vulnerable VM over SSH and asks 'it' to become root. A follow-up paper creating a better vulnerable VM benchmark and retesting results with gpt-3.5, gpt-4 and local LLMs seems to be in the works