Hacker Newsnew | past | comments | ask | show | jobs | submit | more audiala's commentslogin

Part of the issue is to select the right Wikipedia article. Wikidata offers a way to know for sure that you query the LLM with the right data. Also the wikipedia txtai dataset is for english only.


Wikidata is such a treasure. There is quite a learning curve to master the SPARQL query language but it is really powerful. We are testing it to provide context to LLMs when generating audio-guides and the results are very impressive so far.


I wish there was a way to add results from scientific papers to wikidata - imagine doing meta-analyses by SPARQL queries


You totally can! - https://www.wikidata.org/wiki/Q30249683

It's just pretty sparse, so you would need a focused effort to fill out predicates of interest.


Indeed, and hopefully -if there was a structured way of doing it - people might want to do that effort in relation to doing reviews or meta-analysis to make the underlying data available for others, and make it easier to reproduce or update the results over time


Am I missing something? I do not see any results indicated in the statements of that entity.


Right, such a result would need to be marked with a new predicate (verb) like: ``` Subject - Transformer's Paper Predicate - Score Object - BLEU (28.4) ``` One of the trickiest things use a semantic triple store like this is that there's a lot of ways of phrasing the data, lots of ambiguity. LLMs help in this case by being able to more gracefully handle cases like having both 'Score' and 'Benchmark' predicates, mergining the two together.


One of my favorite things about ChatGPT is that I pretty much never have to write SPARQL myself anymore. I’ve had zero problems with the resulting queries either, except in cases where I’ve prompted it incorrectly.


Yeah, it works so well, I wonder if it's just a natural fit due to the attention mechanism and graph databases sharing some common semantic triple foundations


Any recommendations to learn SPARQL? I've looked into it and decided against it about as often as Nix.


How would you like to specify your constraints? We are working on this exact problem (not there yet).


We are working on a similar application and we have the same observation: external data is required to avoid hallucinations, especially if you go to less known places. It's absolutely the case with GPT-3.5 and often with GTP-4. We will release our new content in the next few days. We are finally wondering about eating the cost of expensive TTS or going with a cheap option for okay results. Can I ask which option you used for TTS?


Hello! I know of your guys work! I try to keep up with all the competitors :) Feel free to reach out to me rob @ summer dot ai and I'd be happy to talk shop.

For anyone else that is interested in this question: I've tried a whole bunch of the TTS services and found that Microsoft and AWS are the best of the standard providers IMHO and these are services that tend to have startup credits available so I use a mix of these two - I try to never rely on just one provider. I've met with the Eleven Labs folks and some of their demo's of the V2 stuff that's coming are really amazing but latency and pricing might rule them out as an option for the time being.


Thanks for the answer Rob, we just reached out :) We arrived to the same conclusion, we mostly rely on AWS Polly so far. Hopefully the pricing of better alternatives goes significantly down in the next months. We even tried to run different open source solutions but we could not find anything SOTA.


Even if they create a great UX/UI? GPT would be like the motor of the car, there is still all the structure to build around it.


At Audiala, we use a heuristic approach that leverages existing curated lists of points of interests and generative AI to produce the relevant content. The Ottawa food bank didn't make it to our list :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: