Hacker News new | past | comments | ask | show | jobs | submit login

The search uses elasticsearch 7 for full text search. It's been extremely fast and worked very well. You're right court data is scattered across many different systems and needs to be aggregated, which is a difficult process.



Are you using freelaw's code to scrape all the different servers? Why are there no contact details on the site? I don't understand the mystery and black ops nature of this thing. It feels like there is some sort of conspiracy here that I've yet to uncover!


There are I think about 5 million opinions from that project, yes. I wouldn't say it's blackops, feel free contact me on reddit.


How much ram does that use up? What’s the latency? Is it sharded? Is it a cluster? So many questions


There are 2 search boxes going. One for storing the search index without source and another which stores the source, which is only used for highlighting. Searches usually take under 200ms and SRP and individual pages usually take less than 20ms. The 2 ES nodes are not formally part of a single cluster due to the index storage difference. Another box uses a traditional LAMP setup. Feel free to send a message on reddit if interested in more detail.


How large is the index?

How do you manage that between RAM or SSD?


Search - Index is ~373GB. AMD Epyc 7371 - 16c/32t - 3.1 GHz/3.8 GHz. 512 GB ECC 2400 MHz. 2×1.92 TB SSD NVMe

Highlight - Index is ~620GB. Xeon-D 2141I - 8c/16t - 2.2 GHz/3 GHz. 64 GB ECC 2133 MHz. 2×1.92 TB SSD NVMe

Search and highlighting handled async from queue.


Awesome insight, and site. Thanks




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: