Hacker News new | past | comments | ask | show | jobs | submit login

Great work! One of the things that would be incredibly useful/interesting would be generating a reusable script with an LLM, instead of just grabbing the data. In theory, this should result in a massive cost reduction (no need to call the LLM every time) as long as the source code doesn’t change which would make it sustainable for constant and frequent monitoring.



This approach was studied in a paper called Evaporate+ - https://www.vldb.org/pvldb/vol17/p92-arora.pdf They used active learning to pick the best function among candidate functions generated by the LLM on a sampled set of data.


I’ve worked on this exact problem when extracting feeds from news websites. Yes calling LLM each time is costly so I use LLM for the first time to extract robust css selectors and the following times just relying on those instead of incurring further LLM cost.


Thank you! I’m working on supporting local llms via llama.cpp currently, so cost won’t be an issue anymore


Given that the ollama API is openai compatible, that should be a drop in, no?


Not really, I believe it’s missing function calling

Edit: and grammar as well


Ahh yeah gotcha


I'm working on this problem now. It's possible in some sources - whenever the HTML structure is enough that you map it to the feature of interest - but it could also happen that the information is hidden within the text, which makes it virtually impossible


This is a really nice idea. Wonder what the prompt would look like for that.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: