Hacker News new | past | comments | ask | show | jobs | submit login

How does this compare to fine tuning something like BERT?



I would say similar since the building block is the transformer for both. In this blog post, the fine-tuning strategy used is Adapter. It basically adds a learnable layer to the Transformer block.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: