Great work! One of the things that would be incredibly useful/interesting would ...

diptanu · 2024-04-21T07:45:23 1713685523

This approach was studied in a paper called Evaporate+ - https://www.vldb.org/pvldb/vol17/p92-arora.pdf They used active learning to pick the best function among candidate functions generated by the LLM on a sampled set of data.

andrew_zhong · 2024-04-21T18:20:29 1713723629

I’ve worked on this exact problem when extracting feeds from news websites. Yes calling LLM each time is costly so I use LLM for the first time to extract robust css selectors and the following times just relying on those instead of incurring further LLM cost.

ushakov · 2024-04-21T07:23:48 1713684228

Thank you! I’m working on supporting local llms via llama.cpp currently, so cost won’t be an issue anymore

nbbaier · 2024-04-21T07:26:49 1713684409

Given that the ollama API is openai compatible, that should be a drop in, no?

ushakov · 2024-04-21T07:44:55 1713685495

Not really, I believe it’s missing function calling

Edit: and grammar as well

nbbaier · 2024-04-21T15:56:01 1713714961

Ahh yeah gotcha

fermisea · 2024-04-21T14:21:56 1713709316

I'm working on this problem now. It's possible in some sources - whenever the HTML structure is enough that you map it to the feature of interest - but it could also happen that the information is hidden within the text, which makes it virtually impossible

nbbaier · 2024-04-21T07:09:19 1713683359

This is a really nice idea. Wonder what the prompt would look like for that.