Hacker News new | past | comments | ask | show | jobs | submit login

I haven't tried this myself yet. But I'm surprised you didn't find it beneficial to pass the raw HTML to the chatbot (potentially after some filtering). Did `innerText` give better results than `innerHTML`?

My intuition is that the structure information in the HTML would be useful to extract structured data.




Great question. The problem with the raw HTML was token count. :)

A rather high percentage of pages are far too much for a GPT prompt!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: