Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do you have any resources you recommend for representing sub sections? I'm currently prototyping a note/thoughts editor where one feature is suggesting related documents/thoughts (think linked notes in Obsidian) for which I would like to suggest sub sections and not only full documents.


Sorry, no good references off hand. I’ve had to help write & generate public docs in DocBook in the past. But no expert on either editors, nlp, or embeddings besides hacking around some tools for my own note taking. My assumption is youll want to use your existing markup structure, if you have it. Or naively split on paragraphs with a tool like spacy. Or get real fancy and use dynamic ranges; something like an accumulation window that aggregates adjacent sentences based on individual similarity, break on total size or dissimilarity, and then treat that aggregate as the range to “chunk.”


Thanks for the elaborate and helpful response. I'm also hacking on this as a personal note taking project and already started playing around with your ideas. Thanks!




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: