Damn, looks like you beat NotebookLM by a year and a half!
Is the code for this available? I'm particularly curious how you did the multiple speakers and voices.
NotebookLM has the issue that they keep switching sides, like one will be the student and the other the teacher on a subject, but then they'll suddenly switch in a way that makes no sense.
I did not release the code but it's incredibly basic, and I believe OP's one is just the same.
You collect N links from HN api with any heuristic you want, then scrape those urls - preferably using pupeteer-based tooling or online equivalent (think Jina).
I then ran each url's content in an LLM to get a summary, then from all the results ask a LLM to create the conversation (and give it a tone). Then decide on the voices and characters and feed each turn into 11labs (or any tts). And finally, concatenate all audio parts, add music and effects.
If I remember correctly, mine could perform all that from a single Cloudflare worker. The catch is it can become a bit pricey because of the TTS. I remember toying with making it a product (podcast everything) and quickly discovered there's a couple of company already offering this.
NotebookLM is slightly different on the TTS front, I think they are using the amazing model google showed off a year or so ago (without giving it public access) that can generate actual multi speakers conversations with "hums" and cutting, and talking at the same time.
Interesting, I actually did a search before submitting mine, but I narrowed it down to the last year only. Yours being 2y/o didn't show up. You were ahead of your time!
If you do end up preprocessing the geotiff and if you already have the pipeline to give terrain elevation to user I guess you could also only encode the difference between lidar and radar in your tiles, in order to have only trees data on top of your already served terrains data. The objects you are encoding and the precision you need could fit in as small as 4 bits, with lots of zeros that could be compressed away ? Just a brainstormy kind of comment.
This is a good idea. There are two popular encodings of elevation data into RGB tiles. They are both not optimal in size because their value ranges need to accommodate bathymetric data (negative elevations for mapping the sea floor)
height = -10000 + ((R * 256 * 256 + G * 256 + B) * 0.1) [mapbox/maptiler]
height = (R * 256 + G + B / 256) - 32768 [mapzen terrarium]
If you only care about elevations above sea level (0-8848 meters), you can pack the data into just two bytes maintaining a .13 meter precision (Mapbox precision is .1)
This is the encoding I'm going to use. I've already trialed it and it saves space (I'm not sure about processing time).
The best encoding would be to encode the min elevation for an entire tile in the header and then just store the delta between the tile's min elevation and the elevation for a given pixel. It would be the most space efficient but would involve loading the whole tile data into memory to find the minimum elevation (which is less efficient then streaming and encoding one pixel at a time)
UHM - I really need to know more about you/what you know - because this is a passion of mine as seeking the next thing I want to focus on - and the terminology that you have is exactly where I want my knowledge to be.
Please point me at what I should study to be proficient in topological data science (which is what I want to study)
Are you sure you mean topological data science? I know that there are topological methods for classifying high-dimensional data structures, but this discussion is mostly geographical/topographical. Yes, it does describe a surface, but there's a fundamental assumption that all objects are either on a plane or a sphere.
edit: If you mean GIS (geographical information systems/science), there are plenty of undergraduate courses strewn over github. IMO, the R geospatial ecosystem is more mature than its Python counterpart, but both are very usable.
Feels like this newsletter is actually written by an llm. It's full of repetition and very flat assertion stitched together with high intensity random connectors. "Make no mistake [something incredibly common and already expressed here]", "Our own take on this is [another super cliche platitude here]".
I was about to say this. I can relate hundred percent to the modus operandi, and to be honest I just accepted that I tend to loose interest and that's fine.
Moreover, I came to analyse that what was really interesting to me was this initial dreaming and planning and obsessing phase rather than the long execution.
I tend to discover and learn what I was seeking in the early phase. After that the need is less pressing.
Accepting this I made my profession of being an architect, planner, discoverer, and not a regular maintainer of things.
Note that I also revisit often old stuff and the spark might reignite. I have completed projects in the span of years picking up after a long time.
I think people are blown away by the quality of the audio narration of this one, not by the idea or content itself. AWS Polly sounds like the current generation of artificial voice we are used to.
Yes, this is a very similar technique that I have been using and it works great. One suggestion of something that worked well for me was to use safeParse instead of parse. Then if it doesn’t pass validation, you can retry by passing in the JSON object and the validation error messages. You could also use tricks like starting with a smaller model, then try larger models if you hit a validation failure. Not a great approach for real-time chat but very useful for when you need high-quality results.