More

dtjohnnyb · 2024-11-24T11:42:11 1732448531

I've found good results from summarizing my documents using a large context model then embedding those summaries using a standard embedding model (e.g. e5)

This way I can tune what aspects of the doc I want to focus retrieval on, it's easier to determine when there are any data quality issues that need to be fixed, and the summaries have turned out to be useful for other use cases in the company.

tinyhouse · 2024-11-24T12:57:16 1732453036

Agreed. Esp if you gonna call an API, you can call something cheaper than this embeddings model, like 4o-mini, summarize, then use a small embeddings model fine-tuned for your needs locally.

I was critical about these guys before (not about their quality of work but rather about building a business around embeddings). This work though seems interesting and I might even give it a try, esp if they provide a fine-tuning API (is that on the roadmap?)

dtjohnnyb · 2024-11-09T13:26:24 1731158784

I was trying to do this recently for Web page summarization. As said below the token sizes would end up over the context length, so I trimmed the html to fit just to see what would happen. I found that the LLM was able to extract information, but it very commonly would start trying to continue the html blocks that had been left open in the trimmed input. Presumably this is due to instruction tuning on coding tasks

I'd love to figure out a way to do it though, it seems to me that there's a bunch of rich description of the website in the html

dtjohnnyb · on Oct 12, 2023

A couple of more introductory books that come at it from the point of view of "someone who can code" are: - https://greenteapress.com/wp/think-stats-2e/ (and the similar Think Bayes if you enjoy this one) - https://nostarch.com/learnbayes

Can second Statistical Rethinking though if you have the basics of stats and want to learn it again from a very different, more causal/bayesian point of view.

dtjohnnyb · on May 31, 2023

The book Lynchpin talks about this and calls it "the resistance", the feeling of avoidance you get when nearing shipping something or even to sitting down and starting an ambitious project. I've found it useful to have a name for it so I can recognize when I'm falling prey to "the resistance" and get myself to stop procrastinating or to "just ship it".

SyneRyder · on May 31, 2023

I suspect Linchpin / Godin has "borrowed" the idea of Resistance from Steven Pressfield's excellent book The War Of Art from 2002. That book is aimed at writers, but applies to anyone creative.

dtjohnnyb · on Oct 26, 2022

Right I was confused when the article mentions the other parameters other than the moon and sun ones track other astronomical variables, surely they are modeling geological and hydrological variables also (or potentially primarily)

defrost · on Oct 27, 2022

There are two intertwined "sets" of effects to be modeled:

Primarily Earth + Moon, with a secondary twist of Sun, and a layered decline of precession | orbit woblle, lessor effects (the astro forces),

And then the ground effects; shaping around headlands, sloping of seafloors, funnelling through channels, etc. with a rinse and repeat cycle for sea areas that are "chained" backwards from the main flux via multiple bays and estuaries (internal bodies of water large enough to have their own tides via mood gravity while also connected with a delay to an outer ocean via a long channel, etc.

Fun stuff - I primarily worked with exploration geophysics but dabbled a little in tides and ocean levels across Australia.

dtjohnnyb · on Sept 2, 2022

"The Captain Class" is a great book that glorifies the "water carriers" of a bunch of successful sporting dynasties.

It certainly gave me a lot of encouragement to try to develop more of the traits shown by those great captains.

dtjohnnyb · on May 23, 2022

Slack groups have filled in the meetup space in my life, mlops.community and locally optimistic are two of the best for what it sounds like you're looking for

dtjohnnyb · on Feb 23, 2022

One downside for milvus is that version 1 doesn't do filtering (necessary for most search applications) and version 2 is significantly slower. Google's vector nearest neighbors offering, weaviate, and Vespa are much better options if you're expecting to extend to more realistic workloads

dtjohnnyb · on Feb 4, 2022

From an accessibility perspective it's a disaster too. My granny is visually impaired, but loves watching tv. She's been able to get by for decades by memorizing the commands needed to get where she wants to go (she had a list of nintendo cheat code-like instructions on how to get to various channels she wanted on the satellite e.g. down-down-left-left-enter gets her to her show) We tried to get her into Netflix, but the menus change far too often for this strategy to be of any use. Of course you can use the screen reader/narrator, but you can imagine how frustrating it is trying to find the carousel you're looking for by waiting for the screen reader to tell you, it's frustrating enough doing it visually!!

dtjohnnyb · on Jan 22, 2022

Interestingly Leitrim and other counties in the north west he mentions have been high on the list of places people have escaped to from Dublin during the pandemic. Many people are now living there and working remotely for Dublin companies, and there are far too _few_ houses there now, and house prices have skyrocketed