Famous french radio program about lives and experience, like the moth meets Bourdieu. For this episode, they wrote and made the voices all in ai, relating to the Paris ai submit. The episode is used to trigger réflexions about gen ai.
On the prompt side, it's very simple, and can probably be done in a variety of ways. How we did it is to prepare a prompt with multiple "user" messages. The first one gives the instruction
you are given a reference and three candidates, which one of the candidates do you think is a match to the reference? Only output its identifier or a code when none is found
Not exactly that but something along those lines.
Then one "user" message per car (reference + candidates) with image + text indicating the type (reference or candidate) and an identifier (can be as simple as the index for the candidates).
Poster here. We would have loved that, and it was one of our first proposal - a QR code or some kind of marker. However, the client is understandably very controlling on the aesthetics of their wall as a central element of their scenography. We would have pushed for it again in the last resort, but would probably have lost the contract.
This is completely offtopic, but I would bet it was a government-funded museum.
A reasonable institution would have worked with you to find an acceptable compromise, something much easier to implement with a small sacrifice of aesthetics.
Anyway, great work, and thank you for taking the time to share it!
Really? I would much less expect a government museum to be particular about aesthetics. Privately run museums/collections/exhibitions on the other hand tend to have very finicky owners -- after all, they're putting up their own money to achieve their vision, and so of course they tend to not want to compromise on how it might look.
First time for me posting this kind of story - I thought it would make an interesting case on solving a hard computer vision problem with a crafty product engineer team.
Just thinking that. Spend a few minutes trying to have chatgpt generate some images with Dall-E 3. Flux would probably be better to get all the specific details but ya
Thanks for sharing. Interesting approach. As other commenters mentioned, article could do well with some hypothetical images. Maybe on a follow-up blog post? Also since you mentioning your Company's name you missing an opportunity for marketing by not providing a link.
I just loved the wooden spirograph thing I got my 5yo daughter for Christmas (she does too, what fun). But then I thought making it an app to start exploring how those shapes work with her. And because it's 2024 I just asked an AI (here bolt.new) to build it, and refine by prompting. Thought someone else might enjoy it.
I use supermaven and cline with my own API key, a setup superior to cursor imo. Tried to go back to gh copilot yesterday but couldn't bear it for a full workday, and reverted to my previous arrangement.
I am interested, but why should I use this one over jina ai reader (which is also free) or firecrawl, or the ten other puppeteer + readability + turndown pipeline (or even a AWS lambda doing the same) ? This is not sarcastic I am genuinely looking for something fresh in the field.
Interesting but we process documents before embedding them, and have specific requirements for the embedder.
Having developed a couple of page to markdown myself, I think the bigger challenge is to make sense of so many pages that rely on spacial organisation of information that only makes sense to human, or even presence of images. One way to do it is to render the page as an image and extract data with a vision llm. But you do need heuristic on when to do classic extraction and when to use vision, plus get rid of cookie banner and overlays. This is more complex and costly, but have real business value, for the one that can pull it off.
We, as many players, have custom pipelines on embedding. We don't split docs based on chunk size but do semantic chunking and chunk augmentation. We embed everything with two embeddings services to always have a fallback if one provider is not available.
If I were in your shoes I would not think embedding and inserting in a vector store would be my responsibility, especially since there are so many different stores on the market.
reply