Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Full text search on 630M US court cases (judyrecords.com)
698 points by richardbarosky on Feb 19, 2022 | hide | past | favorite | 269 comments



A lot of commentators are missing the point. Caselaw is what lawyers rely on to determine what the law is. For years, there was a monopoly, the privately-held West Corporation in Minnesota. In the 1970s, the field became a duopoly, with the advent of Lexis. West eventually sold out to Thompson, the company is now Thompson-West. Computer-assisted legal research was extremely expensive. The Air Force set up FLITE, but efforts to access their database of cases failed. Manual research is effective, but paper publication meant you ran the risk of not knowing yesterday's results. In the 90's with the advent of the Internet, a lot of new players moved into the scene; some more successful than the others. Public-interest organizations sued for access to the cases, which, after all, were all in the public domain. There was a Washington-based group that called the caselaw the "Crown Jewels" of this effort--though EDGAR filings probably have more significance for the public. Time has provided a solution; access to anything older than the most recent 400 volumes of the Federal Reporter is really not needed, as any Thompson-West salesman will tell you. But these efforts to open caselaw to the public is not scammy and has nothing to do with shaming. In the United States, these cases state the law which governs our lives. I'm glad they've succeeded in the effort.


A minor correction: West was bought by Thomson (no "p"), which later merged with Reuters to become Thomson Reuters.


Do you have any good write up on this? Was always interested in the how this space is still locked up.


I recently had some criminal charges expunged, and I notice they show up here. Is there any way to request removal of court records which are no longer publicly available from the originating court?


This is a possibility that there aren't any great solutions for currently. Can you message me on reddit with the link to check?


I'm not a lawyer:

In the absence of a reporting mechanism for issues like this, I'd suggest at least a notice / message alongside results to indicate that they may not reflect the current state of official and amended records.

(I think you may be wise to take this issue fairly seriously; there's a risk of people considering the search engine to be an authority in itself -- which, to be fair, is already a risk for any search engine, but since this one is more domain-focused, it's possible that some users could overdevelop a sense that the results are accurate and complete)


This is stated in simple language on the terms page, which is linked at the top/middle of every page. You have to decide between putting the same text on every page vs. a high visibility place vs. a low visibility place. I opted for 2nd to make sure it's clear.


Do most people read and comprehend terms pages before using the information they discover from search engines? (I don't know)


Absolutely not. A vanishingly tiny percent ever click on the terms link.


Indeed, so the only good option would be to only search public records.

Everything else should not be searchable.


It’s good that this is listed in the terms, but this raises two concerns:

1) People don’t read terms. Should they? Yes. Do they? No.

2) This language is more like a disclaimer than a term of use. I would not assume that disclaimers about the accuracy of the result would be found in the terms.

So yes, the terms link is prominent, but no, I don’t think it addresses this issue.


Wouldn't it be easier for people to just individually sue not so much to achieve the end result but to shut down your operation under the costs incurred?


I tried you hn handle on Reddit it says user does not exist.


I'm curious as well. Reddit me at napoleongoldfinger if that's cool?


aoeusnth48


Isn't there laws protecting people against this? Expunged record should not be visible, right?


If a record is made public, it's always part of the public record. If a case has been expunged, those results are removed though because it's the right thing to do.


There are in Europe i.e. the GDPR but not in the US afaik.


Neat. I'll add this to my sources on case law -- another one I've come across is https://case.law/

Per my close friend, the value of these (or, why people subscribe to LexisNexis) isn't solely the texts, but the cross referencing. It would be really cool to see that get implemented (and no doubt a non-trivial problem!).

How do you source your case inputs, as it is bigger than PACER?


CourtListener is a free source that does this very well for high-level courts. (i.e., US Supreme Court, Federal Courts, State Courts of Last Resort/State Supreme Courts).

For that, you have to detect references of cases which is a difficult problem itself, and CourtListener's search ranking also takes into account the citation weight of certain cases. This generally works well, but my understanding is that sometimes a not-so-important case can end up having many citations. Or if a case with many citations is overturned completely or partially, these things complicate which cases might be most relevant in search results too.

The data source is provided for each case. In some cases, a direct reference/link is provided.


lpf.io


Cool, I guess. I'm not really a fan of anything that can be used to further persecute people who have already paid their debts to society. I know this is public information, but the information itself has been relatively opaque for generations.

Beyond mild curiosity, the only paying customers for a service like this will be groups like employers, schools, creditors, landlords etc. It removes the pain around paid background checks, and it includes data that was already legally expunged.

The fact that it includes data that was legally expunged actually makes this service more valuable than traditional background checks to certain types of people.


Legal research services (access to dockets, case text, etc.) tend to be extremely expensive. I think making that more widely available is a good thing for people with limited resources. It's kind of nuts how impenetrable the legal system is if you don't have any resources.


Great point.

It's extremely difficult to represent yourself pro se if you don't have access to information about how cases like yours might unfold, arguments that have been used, how well those arguments have worked, how the cases have been decided, whether a company has settled a similar case as yours, and so on.


Is that really a bad thing? Nobody should be representing themselves because they have no idea beyond Law and Order what all is supposed to be happening around them. That's why you get a complementary attorney if you can't afford one.

It's just too important to risk no?

Like DIY surgery. It's quite expensive and impenetrable to be doing your own appendectomy and I'm not mad about that. In both cases you could if you really had to but a high barrier to entry for me is not a bug but a feature.


> That's why you get a complementary attorney if you can't afford one.

Not in US you don't. Leaving aside how you don't have a right to a lawyer when you are on either side of a civil matter, most states will send you a bill should you avail yourself of the "a lawyer will be provided for you" part of the Miranda rights. Now, if you are broke, they'll still provide services to you.

In Canada you actually get free legal advice before any questioning.


To clarify, that's not a bill you only have to pay if you get a windfall. That's a bill that (in some places) will be agressively collected as a part of your fines.

If you're in prison you also have nearly no rights to a lawyer. (For example, if you want to sue the prison system for inhumane conditions)

Edit: For ex you may need to pay fees to be allowed to drive, which you may need to go to work/buy food.


> that's not a bill you only have to pay if you get a windfall

You can't get a windfall as a criminal defendant, just a conviction or not.


That's (mostly) true, but unrelated to what I meant. I meant to say if the government determines you can't afford a lawyer, it will (sometimes) bill you for the lawyer it provides. Sometimes if you're very poor you get debts that in theory you have to pay but in practice won't be pursued unless you get a windfall, because you don't have any money. This isn't one of those.


I see what you mean - a windfall independently of the criminal prosecution. That makes sense.


Anna Sorokin got 300k from Netflix. Incarcerated individuals may still inherit. Sometimes money comes. The States are not always aggressive in pursuing such funds; but the Feds usually (not always) are.


So poor people with an interest in surgery should have zero access to medical journals? This is all just taxpayer funded information that you and I pay for every year in taxes.


Just like surgery, your argument only works given effective public funding.


Our legal system is built on case law. Reading the law as it was passed is not enough - one must consult rulings in the relevant jurisdiction for precedence. I think the benefits of making this type of information widely available outweighs the potential downsides.


What does that have to do with listing the calendar calls of 25 year old divorce proceedings


You and I have no idea but does that mean no one else might and for reasons of which we'd been unaware?


Absolutely, but I think the general public tends to completely misinterpret what they’re looking at and will often make faulty conclusions if they’re not given the full context provided through formal training (eg. Dr. Google).

The repercussions of an employer doing their own unqualified background check in someone can be detrimental to a person’s future wellbeing.


When you say used to persucute, that's not accurate in any legal sense. Many states have statewide repositories of court data. Also, when you say "can be used for", that's extremely broad and as far as a standard for taking a position on something, I don't think that makes a lot of sense. Do you mind clarifying?


Fantastic. Love it. Wish I could download the whole 630M DB, not just 700K cases from Texas.

I especially love the interface. It's light and fast. Not unnecessarily burdened by JavaScript. Bravo to that.


thank you!


Today I learned that 20 years ago I was a defendant in an unlawful detainer (eviction) lawsuit regarding an apartment I shared in college. I had moved out of the apartment after graduation. Apparently my roommate stopped paying the rent and the landlord sued both of us. I was never served and didn't know about the case until now.


Today I learned that my boss has a lead foot. 15 speeding tickets in six states, all over 85 MPH.


Sounds like a privacy nightmare, though.


That would be more of a problem with the legal system, not this search page.


That's not true. Court records used to be accessible in person only. You would have to be motivated to find anything. This puts everyone's past and often forgotten transgressions on display.

This might look cool or even useful to some, but it's straight up immoral.


Court records, including traffic violations, have been accessible online in the jurisdiction where I live for nearly 25 years. They've been aggregated from that website by third-parties for most of that time, too (which I know because I've done contract sysadmin work for one of the courts for most of that 25 years).


You and others are essentially arguing that we should put artificial barriers in place for accessing information that is public as a matter of law. Arguably there is information that's public today only because the circumstances at the time when that decision was made were different. But the solution would be to make less information public if it's a problem. Not to tell people they can only access this information when the town clerk is in the office--and he doesn't come in much.


I disagree. That is an excellent throttling mechanism. Keeps records open but preserves a modicum of privacy.

It's another topic if court records should be public at all, but I gather that's a pretty integral element of US judicial system.


How does that work? Lawyers and others can run searches and have for decades. This isn't anything new, except now you don't have to pay a lot of money.

Either it's private or it's not. Fake throttling that discriminates based one's ability to pay or suffer inconvenience is ridiculous. Not to mention, it's been decades since this information hasn't been available in this format.


Mixed feelings on this, and a lot of it unfortunately stems from these systems requiring access in-person only as well as courts generally being inaccessible through FOIA. Had these records been available through FOIA from the beginning, for example, then these records would have gone through a review/redaction process. But, they're being released now, which can many ways be seen as a reaction to the lack of access to this information, generally. The extensive overuse of courts for non-violent cases definitely doesn't help either.

As a researcher, there are deep problems with the inaccessibility of court information in that it prevents the general public from learning about systemic issues, for example identifying extensive abuse by judges (singular, or in a group), or identifying whether bail is applied uniformly.

I don't know what the solution is and things get trickier the more you look at them. Restrictive access isn't a perfect answer, since it allows gatekeeping of those critical. Having talked with lawyers who have access, they basically have to keep completely out of public spot light while they have restrictive access, at the fear of losing it. And our massive systems around incarceration have shown themselves as being uninterested in providing information to those who are critical of them. We've dug ourselves into a pretty deep hole.


The documents being released are currently publicly available. It looks like records of minors either don't show up, or don't show details. It also looks like expunged convictions do not have the case showing up.

FOIA doesn't apply for two reasons. One is like you say - many court documents are considered privileged and not subject to FOIA (which I agree has many issues around things like complaints). The other is that documents that are publicly available like this don't need to be requested through FOIA since they are already available.

I can go to my state's website and get the same information as on this site. The thing this site does is allow you to search all state's for free. There are plenty of sites that will allow you to search for people like this, but they currently charge money.


Access to justice and information about justice shouldn't be gatekept through money. I understand where you're going with it, but it's very close to the same reasoning used to justify ex[tp]ensive bail -- it only fucks the poor.


"Access to justice and information about justice shouldn't be gatekept through money."

I agree. My comment was about how this site allows that access for free. And also that FOIA is moot for this information - it's publicly available without a request.


My point is that we should have gone through a FOIA route from the beginning.

And no, a lot of important information still isn't publicly available, so we still have these issues that need to be ironed out for the exact same reason as I'm describing. This release still only scratches the surface. I'm able to get court documents from city law departments, but the process takes forever (ten complaints per week, for example) and the alternative is to go downtown to work with computer systems that have poor search capabilities.

Had we been (and been able to be) aggressive from a FOIA perspective from the beginning, the inefficiencies of these systems (eg, segmentting private information is stored in court records) would be more ironed out.


"This might look cool or even useful to some, but it's straight up immoral."

Again, that's an issue with the legal system. States make this information available to the public online, and have done so for a long time. This is an aggregation, and similar services have been around for a long time.

If you want this information to be private, then you need legislation to be passed. Frankly, this isn't even the most immoral thing the system is involved in with. For example, 2-10% of the incarcerated are wrongly convicted. Or the fact that complaints and misconduct of judges are so secret that even if they contain exculpatory evidence they are not required to be exposed. Or that magistrates in most places are not required to have a law degree nor pass the bar, leading to the farcical outcome that the lawyers arguing the case have more knowledge of the law than the "judge" who is supposed to be the authority on the law. The list goes on and on. One day, enough people will be screwed over by the system that there won't be support for it anymore.


Very true. We have to rethink a lot of the rules around privacy considering the scale technology has made mass surveillance possible. .


Why? There's no private information being shared right?


Technically no but it makes a difference if you have all that information available easily through the internet vs having to put some effort into obtaining the records one by one. Same goes for surveillance: technically you have no reasonable expectation of privacy in public spaces but once it becomes feasible that somebody can plaster a whole state/country with cameras and can analyze the data automatically it gets a little scary. I think a lot of privacy rules need to be revised in the context of current and future technological capabilities.


Not officially, no. But that's not the only possible interpretation.


There are records here connecting my legal name to my deadname, not even on a court name change order. I had gone to lengths to keep things private. This is devastating. I want to cry.


I learned there's an attorney with my exact name and middle initial and his kajillion filings obscure my traffic tickets.


That is great. Regular people access to the information is great power equalizer. I had lost a small case - fine print and a lot of undelivered promises - after 3 lawyers said I'd lose and won it on appeal after finding in an online database (not available anymore sadly) a similar precedent referring the law exactly for my situation. According to yelp and case search the company I had this case with was regularly taking people for a ride, and the people very grudgingly paid hundreds to several thousands of dollars a pop mostly because of the fine print, and I became the first with winning case in that list.


That's a great use case. Thank you for sharing!


OP submitted this site in November 2020 with 400M cases[1]. Other than the increase in cases, what else has changed?

[1] https://news.ycombinator.com/item?id=25150702


Right, more cases primarily. The performance has been optimized so the searches, search result pages, and individual pages load significantly faster. Most searches load in under 200ms and most pages including SRPs load in less than 20 ms. Search syntax improvements (see info page for details). The search is still not very granular and field-specific, but definitely an area of improvement.


Thanks, I think it would be nice to have a changelog.


Not as a criticism but just FEI (For Everyone's Information), reposts are ok on HN after a year or so. This is in the FAQ: https://news.ycombinator.com/newsfaq.html.


Here's Steve Jobs' speeding ticket: https://www.judyrecords.com/record/vde11sdzw25ac


hmmm, middle initial checks out. though it's possible it's another steve.


Was trying to find speeding tickets of John von Neumann, but in vain. It would be nice if one could limit search by years.


“One does not have to be a Richard Feynman to figure out that 200 tons is 100% greater than 100 tons.“ https://www.judyrecords.com/record/dhuql2nm6942


Apparently importing a Jaguar through Canada went horribly wrong for him: https://www.judyrecords.com/record/0vctgni5684d


> Argued and Submitted June 3, 1981.

John von Neumann died in 1957. The name is a bit generic, so many results show up. Hence I wished there was a way to limit search to a range of years.


Good call, now I'm embarrassed. I should've known that. Funny how the mind works. I knew he died in his 50's and was involved in the Manhattan project but somehow was content lumping him in with all the other scientists from Operation Paperclip and using loose math that 1981 was possible.


This seems pretty good at first glance but there's significant room for improvement. Since this is HN, allow me to nitpick...

- "630M" is a big number, sure, but I don't have a sense for what % of total court cases it corresponds to. Is it closer to 10% or 90%? And either way, which ones are included vs. excluded? What was the criteria used? Accessibility, date, costs?

- I get the artistic view behind the choice of typography but the font is just too large. I find myself having to scroll to get just as far as the 5th result. Information density is good in search engines

- The results consist of two pieces: the name of the court (followed by "record", which is unnecessary) and a short snippet, but not the actual name of the case... which is an interesting choice given that the name of the case is stored in a database field as evidenced by the fact that it is in the <title> tag of any detail view

- Also I also think the snippets are too short. Together with the previous point, this site is basically forcing me to click on each potential match to see if it is what I wanted or not

- The URLs are... interesting. Searching for anything takes you to "https://www.judyrecords.com/getSearchResults/?page=1" which does not identify your search. Somehow this is using GET but not storing the form input in the URL but locally somehow... so searching for "foo" in one tab, "bar" in a different tab, and hitting refresh on your "foo" tab will then show "bar" results there. Which is not only "Not Cool", but seems actually harder to accomplish than a straight up form using GET

- And then the actual results have URLs like "https://www.judyrecords.com/record/qxemfajbcae3". I'd be fine with a slug, really, but in 2022 I expect URLs to be API-like

- I can't search for specific cases, e.g. "paramount communications, inc. v. qvc network, inc" returns a bunch of results, none of which are the actual case I'm looking for which is a hugely influential precedent


Valid criticisms, thanks for pointing them out as areas of improvement. Good question about the % of total cases though I think there are some estimates on that. My guess would maybe be 100M+ cases per year.


> My guess would maybe be 100M+ cases per year.

If this is close then that blows my mind. That would be roughly 1 court case per ~2.5 american adults per year.


I think it depends on how narrow you define what a court case is. The number doesn't seem too high if you factor in traffic cases. But you're probably right on a narrower definition, that would be too high.


Some random searches I did showed it does include at least some traffic cases. I saw one for speeding and another for texting while driving.


I note that this isn't just court cases. I have a long ago (paid) traffic ticket in there--well, not the ticket but a record pointing to a no longer existing ticket. (Maybe that's technically a court case though.) Something I wrote is also in a footnote to a patent filing.


Whenever I see stuff from lawyers especially stuff with litigation, seems like the font size, typeface, and white space always punch me in the eye it's so gruesome.


There's quite a few court cases with the string "ASDF".

Are people just being lazy? https://www.judyrecords.com/record/vfa3d40l07812


I'm always amazed at the rampant patent trolling that happens with deep learning papers/ideas. In this dump, if you search with the names of famous researchers in ML (such as Yoshua Bengio [1] or Yann Lecun [2]) you will find 100s of troll patents citing their work. Not all of them are troll though. Maybe this corpus can be used to automatically identify them, perhaps by merging data from arxiv?

[1] https://www.judyrecords.com/record/vibvgc4w4e2bc [2] https://www.judyrecords.com/record/v9kbckceib0c9


Does anyone know why American court opinions tend to omit the word "the" before plaintiff and defendant? For instance

"Afterwards plaintiff sued defendant claiming damages".

In Australia and the UK, this would be

"Afterwards the plaintiff sued the defendant claiming damages".

In general US opinions seem more concise and formulaic than their Anglo counterparts. This is just one striking example. I'm just curious about the origin of this distinction. Perhaps there is some text on concise legal writing prescribed at US law schools which offers such a suggestion?

Another curious difference, it's an opinion in the US, a decision or judgment in Australia/UK.


American courts generally give very strict and tight page limits when submitting briefs, so American legal writing has evolved to be concise as possible. The best lawyers regularly and repeatedly cut their briefs to length, removing any superfluous words. There's no requirement to omit "the" before plaintiff and defendant, but it's acceptable and saves space, so everyone does it.


Thanks, this is a very convincing answer.


Aren’t those usually capitalized as well? I’ve always though that style in legal texts means “a proper noun defined previously” - in case of plaintiff and defendant, probably on the first page. That said I have no legal background so take that with a grain of salt.


They would appear capitalised on the cover page, not generally in text.

Here is a recent example from a 2022 SCOTUS opinion.

"In rejecting petitioners’ allegations, the Seventh Circuit did not apply Tibble’s guidance. [...] The court determined that respondents had provided an adequate array of choices, including “the types of funds plaintiffs wanted (low-cost index funds).”"[0]

By contrast, a decision of the High Court of Australia:

"The appellants applied to the Supreme Court of New South Wales for orders that the third respondent, a former director of Arrium, appear for examination and produce documents. Orders were also sought for the second respondent (the auditor) and the bank who advised on the capital raising to produce certain documents.[1]

[0] https://supreme.justia.com/cases/federal/us/595/19-1401/

[1] https://eresources.hcourt.gov.au/showCase/2022/HCA/3


At least in US contracts, it's fairly common to establish role based pseudonyms at the beginning of the contract. Especially for reused contracts. Presumably, the same style applies to our legal system.

For example, a contract may read:

This contract is between John Smith (hereafter employee ) and XYZ LLC, a Delaware company (hereafter employer). Employee agrees to provide Employer with services for...


Guessing: it’s an old legal system (not in relative terms, perhaps, but 250 years is a decent chunk of time) and a bunch of the language has stayed pretty similar over time.


I don't think it's the plaintiff. It's John Doe, here by referenced as Plaintiff.


You should be able to do an exact match search here. Trying to use double quotes on my name turns up a boatload of hits, but most of them appear to be cases where my first name is found somewhere on the page, and somewhere else my last name is found somewhere on the page.

It should also be possible to limit the search by city, state, and or region, as well as by timeframe.

Not very useful.


Exact match searches are supported.

Can you give an example?

Also, to limit by other qualifiers you can add those to the search criteria. However, the search isn't field-specific and so that ability can be done loose-ly (like Google) and not in a strict field-by-field sense. It's difficult and time consuming, but something that could improve the search.


How are exact match searches supported? There is absolutely no evidence on the page that this is possible.

How does searching for “brad knowles” match “brad alan knowles”, when I put my name in quotes? How does it match a case where “brad knowles” does not appear to be used anywhere on the page, but where one line matches “knowles” and then another line matches “brad”?


"brad knowles" - 14 results returned. The exact phrase is found in each.

brad knowles - 1925 results returned, which include results with brad and knowles in the text, where close proximity cases are ranked higher. Results with brad and knowles further apart will be ranked toward the bottom.

When you use quotes above, I'm not sure if that means what's in quotes above is what you searched or that you searched with quotes, but based on these checks and what I see, exact match searches as well as weighting without use of exact match quotes is working correctly.


Here's a case that shouldn't match an exact search, because the two words are separated by a line break: https://www.judyrecords.com/record/nqt11i9y33dec

I've found other cases where most of the hits shown on the screen are for one word or the other, but not both together. There is at least one hit on each of those cases where the two words are properly found side by side and in the correct order, and so it is technically a hit for the search. But the display is not correct, because on displaying the article it is showing each word hit individually from the others.

Using proper ASCII quotes to force an exact match instead of somehow getting smart quotes is definitely an improvement, but there's still more work to be done here with regards to line breaks and display of hits.


You can put 3 commas after your name to search for an exact match, eg brad knowles,,,


I think this is a smart-quotes issue -- smart-quotes are removed:

“brad knowles” -> brad knowles

"brad knowles" -> "brad knowles"

I always set my OS to disable smart-quotes. Worst is when you paste them into source code.


Maybe that can be handled better automatically. Good point, thanks for mentioning.


So, a couple of weird things here.

First, when I go to the website, whether it's the mobile version or the desktop version, the on-screen iPadOS keyboard is immediately hidden from me. I have no way to type anything into the website, unless I flip out the physical keyboard that just happens to be attached to this case. I have never seen that kind of behaviour before on any other website, ever.

Second, Apple does not make it easy to figure out where the "turn off smart quotes" option is located. I think I turned it off under the switch for Settings > General > Keyboard > Smart Punctuation but I'm not 100% certain. Nevertheless, this is the first case where I recall using quotes where they were not honored as I would have expected. I'm not sure where the blame lies on this -- is it a user expectation problem, an iPadOS problem, or a website problem?

I'll try again, this time trying to make sure I use the proper type of quotes.


What kind of smart-ass OS would do that in a single line text input form?

This is a feature that belongs to a word processor.


Fuck me.

How did I not spot that?!?


Just an FYI -- you probably need to declare the use of Google Analytics explicitly in your terms. (Although my personal preference is something that does not require consent, like Matomo or Plausible Analytics :)


why would that be? I don't think I have ever seen a site that disclosed they are using GA?

FWIW: I also prefer Plausible, and have all GA traffic blocked in my hosts file


Since it collects personally identifiable information (at least IP addresses, but it's not clear where it stops) this requires special treatment under GDPR: https://en.wikipedia.org/wiki/Google_Analytics#Privacy


Why does an unmonetized website about US court cases, presumably targeted towards Americans, need to care about GDPR?


Who says this website is unmonetized? The search query stream alone is very valuable.


There's no way to make an account, and there isn't any functionality for payment. There aren't even any Paypal/Patreon accounts or donation links. The info page on the site literally says "judyrecords is a 100% free nationwide search engine".

How do you think this website is monetized in the absence of those things?


Because the website is accessed by Europeans, meaning it is collecting Europeans data (via google analytics). But also because California has CCPA which is more or less equivalent to GDPR (as far as I understand at least. I might be incorrect).


Got it. In that case, whatever European commission is in charge of GDPR fines can charge them a % of their $0/year income like other GDPR violators.


I'm not sure that you understand GDPR fines, the annual revenue is only used as an upper limit for companies with massive revenue.

The service being free doesn't protect you.

> The less severe infringements could result in a fine of _up to €10 million, or 2% of the firm’s worldwide annual revenue_ from the preceding financial year, _whichever amount is higher_.

> The more serious infringements go against the very principles of the right to privacy and the right to be forgotten that are at the heart of the GDPR. These types of infringements could result in a fine of _up to €20 million, or 4% of the firm’s worldwide annual revenue_ from the preceding financial year, _whichever amount is higher_.

Source: https://gdpr.eu/fines/


Good to know. Hope the author of the site is aware of this if they ever go to Europe.

They should just block European IPs like other sites do, though. It'd be safer for them and also less work.


Yep. But as I said, California CCPA is more or less equivalent to GDPR, so even this might not be enough.


It was just an example; similar issues occur under the CCPA and other legislation. (Assuming no user is covered by GDPR, which is likely not the case.)


This is why services simply block European IP addresses. If your options are:

1. do extra work to make your service comply with laws in areas you don't live or have any customers

2. put a blanket IP ban in place for these places where you don't live or have customers

3. do nothing

Bigger companies will do number 2, and individuals/small business/small and unmonetized projects will do number 3.


[EDIT]

Sorry everyone and thanks everyone for the sentiments. I have been advised not to write anything public about this further.

This may be nothing and this will also pass. Unfortunately, I've to delete my comment here.

About: IBM seem to have stolen my idea and patented it. I have the full source code, and multiple proofs that I'm the owner.


Lesson learnt, if you have an idea, even most mundane, boring one; don't just go about writing it openly -- there are too many IBMs that will claim as their own.


Do you have any references to the post? Original post/diagrams or other references?


That's pretty unbelievable, wow.


Searched my former boss on this. Hoooo doggy, I knew he was up to some questionable financial practices, but it looks like it caught up with him.


603 total cases for: emacs

260 total cases for: "mind control"

768 total cases for: "donald j. trump"

State of Minnesota vs Steven Captain America Rogers https://www.judyrecords.com/record/vfvd30smme78f


> mind control

I love it! (Is witchcraft constitutionally protected?!)


It's much more limited in what's covered, but when I had some questions around VAT I found the website of the British and Irish Legal Information Institute really helpful: https://www.bailii.org/

It's noindex, so it would normally be super hard to find the cases if you don't search on the BAILII site directly.


The fact that this is free is mind boggling. Maybe four or five years ago I had access to a commercial court search API which had 850mn cases nationwide, and it cost a pretty penny.


Crazy. Searching my name find almost every traffic ticket I’ve ever had, as well as a bunch of people with the same name as me getting tickets.


Using Apache mpm prefork and server load < 1. 132 busy workers. Not bad.

https://ibb.co/4J7STV6

https://ibb.co/NYwWd3T


TIL that I'm cited in a lot of patents.


Entity recognition on this would be an extreme value add.

Don't you think?


It's a difficult problem, but that would definitely be useful for sure.


https://patents.google.com is great for this


Really neat.

I found my dad's DUI.

He went into rehab and cleaned up. I got to spend more than a decade with a sober father, before lung cancer got him.


What the heck does this use to search so fast? We use Elastic at work, and for 100K entries, it crawls compares to this.


Thanks!

Replied to this comment here with some additional info: https://news.ycombinator.com/item?id=30399881#unv_30400160


Sounds like you're resource-starving it if it crawls at 100k. Either that or ES is perhaps not your bottleneck?


As a corpus for ML training, I'd be interested in whether there are linguistic predictors for court victories in opening statements and whether optimizing for them could yield an advantage.


This site will be the first stop for anyone wanting to harass another person online. Some times a little friction is a good thing.

I love projects like these, but they're the digital equivalent of "dual use technologies". They can be used for good or evil.

That said, nice work.


There's a whole lot of information that the collective "we" decided to make public for various reasons. But those decisions making things public were in the context of the information being in some dusty town, county, or state office somewhere.

With more and more of that information being digital, we've more or less punted of the question whether all that information should still be public. Overall, more transparency is probably good but, as you say, it's not an unalloyed good as most of this information will live forever and be cheap/easy to access.


> I love projects like these, but they're the digital equivalent of "dual use technologies". They can be used for good or evil.

Isn't pretty much every technology "dual use"? Just look at social media. You need a platform that gives you the ability to harass someone in order to actually do it.

> Some times a little friction is a good thing.

We as a market repeatedly justify the frictionless experience of being spied on for ads in ways that we have little to no control over, but we're gonna deny ourselves the frictionless experience of being able to see public records because we're worried about our privacy?


On the other hand, powerful people who wanted to harass you or hurt you have had access like this for a long time.

It's how I feel about facial recognition technology or other ML-based technology too. The worst people who could ever have access to it, already had access to it. Giving everyone access to it is just leveling the field.


I'd love a world where the "powerless" have the same ability to leverage surveillance as the "powerful".

I'd love it if we could achieve that balance by eliminating surveillance, but I don't see that as a realistic outcome (at least initially).

In the absence of eliminating surveillance I'll take full public transparency. Maybe such transparency would even drive the elimination of surveillance.


I tried some rather specific queries of things I know to should return some records and it was fairly useless, so I'm not terribly worried.

Just anecdotally, I have a fairly uncommon last name but common first name, I know what states/counties I have appeared in court in and couldn't find any of the records. If you search something like <name> <county> <state> the results are overloaded with <county> <state>, for example.


Yeah, the only missing piece for fulltext harassment is a "Google alert" for particular keywords. Put the names you wanna track and receive a delightful alert in your inbox with rocks to throw over other people's roof.

EDIT: the tech is great, but I think there should be a record of who is accessing the data, for what purpose, terms for how it can be used in a civil way, and means to go after misuse.


How is harassment as a service not a thing yet?

You get a "Google alert" for your target. The service presents you with several buttons:

1. Send an AI written email 2. Post a link to the new info on their Facebook page 3. Tweet an image macro with the incriminating text embedded @ them


It is a thing, but making it so easy to find and access court documents mentioning someone's name will add to the pile of rocks malevolent people can throw at anyone.


> How is harassment as a service not a thing yet?

What makes you think it isn't?


Given that one third of Americans have criminal records of one sort or another, so that somebody almost certainly has a criminal in their family or near circle of friends, I suppose criminality is about the same as finding out somebody watches porn.

on edit: actually one third is probably overstating but close.


>This site will be the first stop for anyone wanting to harass another person online. Some times a little friction is a good thing.

Precisely I was thinking of how much fun we'll be having in efnet with this.


I think broadly the same tradeoffs exist for any search sysetm, like Google or PACER for example.


Not sure how good this on a "regular citizen" level. I tried several drug/alcohol related incidents that I knew about and nothing came up.


Seems to completely lack records from some states.

I searched a close friend's last name and I got all the stuff I expected - his civil suit, his divorce, his sister's paternity suit, a foreclosure involving his cousin. Seemed very complete.

Searched my own last name and a whole bunch of records of my uncle's various criminal activities came up. Surprisingly what was missing was records of my dad's various criminal activities.

My friend, my dad, and my uncle all live in three different states. My dad is currently incarcerated.


Is this what Aaron Swartz was trying to achieve [1] here?

[1] https://arstechnica.com/tech-policy/2013/02/the-inside-story...


No, this is just "metadata" more or less - who was sued by whom over what and what were the individual events in the case. PACER has the individual filings - the complaints, briefs and orders and so on.


Cool project, it led me to a patent filing which is also a part of court cases i guess.

Anyway, now i know someone have tried to patent a scooter that looks like a firetruck.

https://insight.rpxcorp.com/patent/USD552186S1


This is wild, I have so many speeding ticket judgements. I forgot I’d ever set foot in Idaho.


Maryland driver with license plate “HIJINKS” pleads guilty to speeding ticket https://www.judyrecords.com/record/r28qv86bva834


Nice work, very fast, simple display. Also, I learned that a speeding ticket I never paid back in the 80's is still an open case, and that a few articles I wrote in the 90's have been cited in dozens of patents. Yippee.


Doesn’t that mean you can be arrested in that state?


Probably? I haven't been there in years, but I'll be contacting them to pay the fine.


OP, Will you consider allowing download of the actual dataset? (WikiPedia style)

If not, what would hold you back from doing so?

I would be OK hosting a mirror of this service including the infrastructure (and handling the associated costs) if it helps.


Is there a Canadian version? Or something similar within a Canadian context.


In Quebec, we have the SOQUIJ (Société québécoise d'information juridique) [1] that allow you to search court cases. Other provinces might have something similar.

[1] http://citoyens.soquij.qc.ca/



Good question. Not that I'm aware of.


Wow. What a resource. Do a search for "cDNA patent" and read how many things the Supreme Court can get wrong in just a few paragraphs. (Hint: cDNA is not "composite DNA").


Wow I just found out that a lot of distant family members on the opposite side of the country who I've never met are really bad drivers. Found one of my own moving violations in there too.


Funny. I found my various software patents in there.

For most of them - even though they are my patents - I cannot determine from the patent what it is that I have invented.


I know there is some open source (?) effort to publish and give access to court cases instead of having it behind a paid subscription channeled through the federal court system. Does anyone know how that's going?

And also, are only the primary filings of the court and parties available to be searched? What happens to depositions, evidence records, etc. that are part of the case? Are those ever available to the public?



Mind sharing info on server, backend, costs etc?


Replied to this comment here with some additional info: https://news.ycombinator.com/item?id=30399881#unv_30400160


Damn, even traffic citations in there. Wow.


Great job Richard. Do you have or plan to have an API avaialble? We (Kagi Search) would like to use this.


TIL that a lot of sad MFers share my name...

This tool is awesome, but, in knucklehead hands, could be fairly awful.


I have a speeding ticket listed as "Criminal" in here... That's news to me


Hmm, sounds like a broad classification associated with the record. In the broadest sense, I think all cases are defined as either civil or criminal.


I see another speeding ticket listed as "Non Criminal".

I've got another ticket not listed as anything and actually it has my full DOB and previous address, cool


> I've got another ticket not listed as anything and actually it has my full DOB and previous address, cool

If it's a street you grew up on, what banks do you use? ;)


Chase, why?


I was making a joke because some basic info like name, DOB and address are enough to get into a bank account if the password was forgotten, especially if you know the answer to security questions like "What is the name of the street you grew up on?"


Yeah I was joking too


Woosh


I also learned recently that if things weren’t sealed then it is available to anyone. Is it possible to create a DB of all such public documents attached to the cases? What would it take to do that?


How is this different from other free legal DBs like Justia and Casetext?


Justia is a general legal info portal and has many high-level court opinions within that portal. Casetext is primarily legal research software and has many US/state codes within its database. (https://casetext.com/coverage) I think the broad strokes are right in that summary. judyrecords has many more cases than Justia or casetext. More than 600M+ if I had to guess quick.



You can think of a patent as a legally binding decision itself, and a kind of legal case in its own right.


Even I can see my gist links in cases https://www.judyrecords.com/record/va3htxa9i5abd


This is great. I tried it and found the traffic tickets I got.


I can't find any of mine. I always did traffic school, so I suppose that's why.


I wonder, what technology the database is implemented in.


Do you really need to include divorce cases?


This is unbelievable. It has speeding tickets.


Found out an article I wrote in 2014 was cited in a patent application and found a speeding ticket I paid off. Cool!


I discovered that I have an open warrant for an unpaid fine and will get that sorted out. Very useful site


Page 1 of 78 total cases for: wikileaks


Nice. Could you add a URL-parameter for the search-term, so it is possible to link to a query?


Thankfully none of the (many) speeding tickets I got in my youth are showing up.


It found my 2 patents, cool!


This is a pretty big violation of privacy, especially for people who's criminal records have been expunged.

It could satisfy some people's curiosity or if they are a lawyer, they could save a few grand on PACER, but for everyone else this is a privacy disaster.

I hope you take this down eventually.


You don't need to be a lawyer to get a PACER account. I'm a felon, I have a pacer account.


Hard agree, it feels like an invasion of privacy in line with the cancerous people search/background search websites that make available everything from your home address and phone number to your pets name.


this is excellent with super fast results.

What is the tech stack?


Thank you!

Replied to this comment here with some additional info: https://news.ycombinator.com/item?id=30399881#unv_30400160


Page 1 of 2,970 total cases for: donald trump


Page 1 of 2,685 total cases for: william jefferson clinton


yes more specialized search engines :D


Simple judyrecords continuation search from command line. Requires sed, netcat, stunnel and flex.

    setup:    
    flex -8iCrf 045.l;
    cc  -std=c89 -Wall -pedantic -pipe lex.yy.c -static -o yy045; 
    flex -8iCrf 046.l;
    cc  -std=c89 -Wall -pedantic -pipe lex.yy.c -static -o yy046;  
    stunnel 1.cfg
    echo 127.39.100.156 www.judyrecords.com >> /etc/hosts;
    usage:
    1.sh query > results.htm
    1.sh < results.htm >> results.htm
    
        #!/bin/sh 
        # http/1.1 pipelining: only allowed 3 req then connection closes :(
        x0=$(printf '\r\n');
        x1=www.judyrecords.com;
        x2=$(printf 'host: '${x1}'\r\n');
        x4=$(printf 'connection: close\r\n\r\n');
        case $# in :)
        ;;0) # get next page
        x5=$(sed -n '1{s/.*x3 //;s/ .*//p;};
        /Go to next page/{s/.*href=\"//;s/\".*//p;}');
        x6=$(echo "${x5}"|sed -n \$p);
        x5=$(echo "${x5}"|sed -n 1p);
        test "${x5}" -a "${x6}"||exit 1;
        sed "s/ *//" <<eof|nc -vv ${x1} 80|yy045;
        GET ${x6} HTTP/1.1
        ${x2}
        cookie: ${x5}
        ${x4}
    eof
        ;;*) # get cookie (x3), then get 1st page
        S=$(echo "$@"|yy046);
        x3=$(sed "s/ *//" <<eof|nc -vv ${x1} 80|sed -n "s/.*session=/session=/;s/\;.*/${x0}/p";
        GET /addSearchJob?search="$S" HTTP/1.1
        ${x2}
        ${x4}
    eof
        );echo "<!-- x3 ${x3} -->"|sed "s/${x0}//g";
        echo "<base href=https://${x1} />";
        sed "s/ *//" <<eof|nc -vv ${x1} 80|yy045
        GET /getSearchResults/?page=1 HTTP/1.1
        ${x2}
        cookie: ${x3}
        ${x4}
    eof
        esac;
        exit
    
        1.cfg:
    
        debug=debug 
        pid=/tmp/1.pid
        foreground=no
        [ judyrecords ]
        accept=127.39.100.156:80 
        client=yes
        connect=54.39.100.156:443
        options=NO_TICKET
        options=NO_RENEGOTIATION
        renegotiation=no
        sni=
        sslVersion=TLSv1.3
    
        045.l:
        
        /* chunked transfer decode */ 
        /* do not use when carving out attachments, e.g., PDFs */
         int fileno (FILE *);
        xa "\15"|"\12"
        xb "\15\12" 
        %option noyywrap nounput noinput 
        %%
        ^[A-Fa-f0-9]+{xa}
        {xa}+[A-Fa-f0-9]+{xa}
        {xb}[A-Fa-f0-9]+{xb}
        %%
        int main(){ yylex();exit(0);}
    
        046.l:
    
        /* URL encode */
          int fileno(FILE *);
          #define p(x,y) {putchar(37);putchar(x);putchar(y);}
        %option nounput noinput noyywrap
        %%
        \x20 p(50,48); /* space */
        \x21 p(50,49); /* exclamation mark */
        \x22 p(50,50); /* double quote */
        \x23 p(50,51); /* pound sign */
        \x24 p(50,52); /* dollar sign */
        \x25 p(50,53); /* percent sign */
        \x26 p(50,54); /* ampersand */
        \x27 p(50,55); /* single quote */
        \x28 p(50,56); /* opening parenthesis */
        \x29 p(50,57); /* closing parenthesis */
        \x2A p(50,42); /* asterisk */
        \x2B p(50,43); /* plus sign */
        \x2C p(50,44); /* comma */
        
        \x2F p(50,47); /* forward slash */
        
        \x3A p(51,58); /* colon */
        \x3B p(51,59); /* semi-colon */
        \x3C p(51,60); /* less than */
        \x3D p(51,61); /* equals sign */
        \x3E p(51,62); /* greater than */
        \x3F p(51,63); /* question mark */
        \x40 p(52,64); /* cuneiform */
        
        \x5B p(53,91); /* opening bracket */
        \x5C p(53,92); /* backslash */
        \x5D p(53,93); /* closing bracket */
        \x5E p(53,94); /* caret */
        
        \x60 p(53,96); /* backquote */
        
        \x7B p(53,123); /* opening brace */
        \x7C p(53,124); /* vertical bar */
        \x7D p(53,125); /* closing brace */
        \15
        %%
         int main(){ yylex();exit(0);}


Fixed error in 046.l

    /* percent encode */
      int fileno(FILE *);
      #define p(x,y) {putchar(37);putchar(x);putchar(y);}
    %option nounput noinput noyywrap
    %%
    \x20 p(50,48); /* space */
    \x21 p(50,49); /* exclamation mark */
    \x22 p(50,50); /* double quote */
    \x23 p(50,51); /* pound sign */
    \x24 p(50,52); /* dollar sign */
    \x25 p(50,53); /* percent sign */
    \x26 p(50,54); /* ampersand */
    \x27 p(50,55); /* single quote */
    \x28 p(50,56); /* opening parenthesis */
    \x29 p(50,57); /* closing parenthesis */
    \x2A p(50,42); /* asterisk */
    \x2B p(50,43); /* plus sign */
    \x2C p(50,44); /* comma */
    
    \x2F p(50,47); /* forward slash */
    
    \x3A p(51,65); /* colon */
    \x3B p(51,66); /* semi-colon */
    \x3C p(51,67); /* less than */
    \x3D p(51,68); /* equals sign */
    \x3E p(51,69); /* greater than */
    \x3F p(51,70); /* question mark */
    \x40 p(52,64); /* cuneiform */
    
    \x5B p(53,91); /* opening bracket */
    \x5C p(53,92); /* backslash */
    \x5D p(53,93); /* closing bracket */
    \x5E p(53,94); /* caret */
    
    \x60 p(53,96); /* backquote */
    
    \x7B p(53,123); /* opening brace */
    \x7C p(53,124); /* vertical bar */
    \x7D p(53,125); /* closing brace */
    \15
    %%
    int main(){ yylex();exit(0);}


Super interesting. Can you share screenshots or video of how this all comes together?


Crazy


oh these includes patents, weird


Yeah - is there no way to filter out patents? Bit frustrating.


Phrase exclusion syntax is described in Search Tips. For patents, it’s appending -patent to your search term.


The best way currently is to just add -patent to the search.


This is... not great. It's crucial that these records be open to public inspection. But instant full-text search of the entire dockets of 630M cases feels wrong, invasive, and dangerous to me.

It's yet another instance of panopticon surveillance now being too cheap to meter. I think our society needs to come to grips with this new reality and figure out what to do about it.

Or are we all just cool with this?


These records have always been available to people with money to spend on a lawyer with a subscription. So what you're complaining about is that normal people can also access the information now.


Quantity has a quality of its own. To use a similar example, arrest and imprisonment records are public data in my country. But you have to actually go to the courthouse and fill out some paperwork and/or hire a lawyer to do it for you.

This has consequences. For example, in some US states it takes a few seconds for an employer to find out a candidate was once arrested while drunk, or has a conviction for a minor offense from 15 years ago. And employers do that sort of search routinely, because it's free and easy. Only someone being targeted for a specific background check gets that treatment here, because it's not so easy.

Same argument applies to, for example, reading the previous divorce case for someone you're dating. Only a real weirdo would do that here, in part because it involves time and money. If it's freely available online, I do think it would be a lot more common.

I don't know whether it'd be better or worse to have such information more accessible, but it can change things.


> Only a real weirdo would do that here, in part because it involves time and money.

I think your parent's point is that money isn't an issue for the rich. A billionaire doesn't care that it costs $150 to find out, they don't care that it costs a $1,000 to find out. So suddenly information becomes a class issue. Either it should be available to nobody or everybody, money shouldn't factor into it.


i don't think that's a valid argument.

A lot of policies or laws don't affect billionares the same way. We don't fine speeding tickets based on income like norway. Nobody is changing any laws to make it proportional impact on billionares.


Nice false equivalence.

Lawyer: duty-bound professional, is an officer of the court, can be publicly disbarred, very expensive degree that needs to be paid off

Some guy on the internet with an axe to grind: ???


I believe in California you only need to pass the bar to become a lawyer, no expensive degree required.


You're all missing the point... ANYbody with a Lexis Nexus subscription, or a Bloomberg terminal, or one of those background check sites, already has this exact capability. It's not new.

You dont need to be a lawyer to access any of it... I think the other poster simply meant that lawyers generally have Lexis subscriptions, already.

Also, the various court databases this site is searching are ALREADY online and publicly available, and have been for years. This is just providing a free, unified interface with a fast search index.


> This is just providing a free, unified interface with a fast search index.

Yes, and that is a phase change difference. It's not a trivial enhancement.


So you think the $10/month fee for existing services is what induced the phase change? I've done hundreds of these searches, paid maybe $20 total.

I have a different theory about why this bugs you... Previous to today, you were ignorant of that these records were available online so cheaply and quickly. Nothing has really changed, except your own anxiety levels as your worldview struggles to absorb this information. But your brain wants the change to be external, because that's less threatening than the realization that this capability has been lurking out there in the world, all along.


Ha! It bugs me for exactly the reasons I stated above.


At some level I get the angst about typing someone's name, especially if it's fairly unusual, and getting back a whole lot of information about, in this case, mostly legal-related stuff and in others past addresses, things they've written etc. for free. (And, if you know something about them you can probably sift the returns somewhat effectively.) You may be able to find out a lot about your date, your neighbor, etc.

On the other hand, outside of casually checking out someone, the reality is that this has long been available for anyone want to spend a very few bucks to do so.


That does not appear to be correct. You can see the requirements to be admitted to the California Bar here: https://www.calbar.ca.gov/admissions/requirements

Note in particular the education requirement:

> Here are all the options:

> - Three or four years of study at a law school accredited by the American Bar Association

> - Four years of study at a State Bar-registered, fixed-facility law school

> - Four years of study with a minimum of 864 hours of preparation at a registered unaccredited distance-learning or correspondence law school

> - Four years of study under the supervision of a state judge or attorney

> - A combination of these programs

If all you've done is pass the bar exam, you can go to hell. It may not even be possible; the requirements for the California exam are behind a login wall, but other states have restricted eligibility to take the exam based on the testee's education. I assume they didn't want to be embarrassed by having the wrong sort of person pass the bar.


I stand corrected.


I’m sure both of the attorneys who have done an apprenticeship are happy about that.


The happiness of lawyers is something I am quite happy to not care about. :)


I think you may be misunderstanding what this is... All of these documents were ALREADY public records, and were ALREADY available online. Most US courts have been publishing these records online, for a while now.

And they are ALREADY other websites/search products that provide a unified search interface... Lexis Nexus is probably the biggest/oldest, and I believe Bloomberg also has this feature... There are dozens (if not hundreds) of cheap public record search websites that charge $10/month for it, too.

If you're surprised by all this, you haven't been paying attention... For a few decades now.


Powerful corporate and government actors have massive surveillance and data warehousing capabilities that aren't going away. At the very least, putting those powers into the hands of the public helps to level the playing field.

Society will have to change to accommodate the digital panopticon. I don't see the digital panopticon going away, though.


> putting those powers into the hands of the public helps to level the playing field

Agreed, but ...

> Powerful corporate and government actors have massive surveillance and data warehousing capabilities that aren't going away.

To nitpick: They aren't going away as long as we spread that message. It's not easy, but we can make them go away. People do accomplish things and change the world - just compare today's world with 500 years ago; all the differences the result of people changing things. Defeatism is trendy, and who benefits? (The status quo.)


> To nitpick: They aren't going away as long as we spread that message. ... Defeatism is trendy, and who benefits?

It's not defeatism-- it's just being realistic. I don't believe there's any useful method to make government actors comply with the law. I have an, admittedly US perspective, but evidence the FBI under J. Edgar Hoover, the NSA and the subsequent Church committee hearings, and Snowden's disclosures as examples. The power afforded by mass surveillance and data warehousing is too attractive not to be abused.


> It's not defeatism-- it's just being realistic.

You must have heard that line from pessimists 10,000 times before. There would be no startups, science, democracy, etc. if people believed it. We'd be living in caves - 'let's be realistic, we've been living in them for 190,000 years!'

> I don't believe there's any useful method to make government actors comply with the law.

The evidence is overwhelmingly otherwise: Many, many government actors have been caught, prosecuted and punished, at every level - including multiple Presidents, at least one Vice President (off the top of my head), members of Congress, federal judges, generals and admirals, and more, and that's just the federal level.

Goverments have widely differing levels of corruption, and the US has in the past been one of the best - so it's effective. Other countries are also more and less effective than the US, and we can see what they do and what works. There is plenty of research. Nothing is stopping you, but you.

The current trend in defeatism - against all evidence, in a country with one of the most effective governments in the history of humanity - serves someone's interests. Who? Who benefits from spreading this message?


I don't see any problem with this. These cases are in the public record, why should the public not have the ability to search them for free without requiring access to expensive legal indices?


The public has always had the ability to search them, it's just been more difficult to do in the past.

Lowering the barriers to this is not necessarily a good thing. For example: if I have to drive to the county clerk's office to get records, I am unlikely to do so. This means that if I need the information, I will go get it, but if I do not need it, I won't bother.

The patent section being so easily searched is very useful and I see no downsides there.


Sure, why not? It's not like anything embarrassing or things you want kept secret should be in court proceedings

https://www.judyrecords.com/record/vvfe9mivbec8c


When the giant Equifax hack happened creditors had to start doing more diligence to make sure they weren't allowing fraudulent borrowing since it was happening so frequently.

So optimistically maybe having lots of information public and easily searchable like this ultimately leads to a similar outcome. Like maybe it forces us to stop using date of birth to verify identity since it's so easy to find anyone's date of birth.

If the information is going to be out there maybe it's safer for everyone to have it than just a few people. If just a few people have it, and they use it to screw just a few other people, there's no pressure to fix it.

Or of course maybe that's wishful thinking.


Don't lawyers already have access to case law like this? I feel like this is not a new thing, but giving access to everyone is novel.

I could be wrong on my facts.


Generally you’ve had to pay for an expensive service (Lexus nexus), or go to the courthouse yourself to pull the records. Search was also a bit of a black art.

So generally easy to hide in the noise. Here you can just put in a name, and off you go.


Lexis has the best search capabilities, but there are dozens of cheap clones now that start at $10/month to search these same records.


The public has access to most local court data in my state (Ohio, US) thru websites run by the various local courts. A state-level database for government use is, as far as I know, still not actually available (though it has been in planning and some phase of execution for 10+ years).


> Don't lawyers already have access to case law like this?

Yes, through expensive services like Westlaw and Lexis.


I couldn't find my name. And i know it should be in here. So I'm not that worried yet...


There's no escape. It's just a matter of time.



There are public court records (criminal, civil), and there are non-public court records (e.g. sealed - juvenile, divorce, etc.)

As far as I can tell, all of this data is of the public nature.

While it may feel weird to type in someone's name and see their history with regard to legal filings... that is the society we live in: an open society.

Aggregating a number of disconnected data sources for search I think is absolutely a legitimate usage of the data.


I have some records that are sealed, but show up in this database. So there are records that were once ‘public’ but are no more, but this database makes them public again.


FYI, I found a couple folks I know’s divorce records. So I wouldn’t assume those hard and fast rules apply consistently.


Fair enough - in my state they are limited to parties involved and their counsel.

The public can still see the filing and result (when the divorce was granted), but the actual documents are restricted so as not to air all of one's dirty laundry unnecessarily.


This is amazing. Can you share any info on how you were able to compile so much info from different sources? In my limited experience of hunting for legal filings, it seemed like every court had its own system, with nothing standardized or programmatic.

Thanks!


The search uses elasticsearch 7 for full text search. It's been extremely fast and worked very well. You're right court data is scattered across many different systems and needs to be aggregated, which is a difficult process.


Are you using freelaw's code to scrape all the different servers? Why are there no contact details on the site? I don't understand the mystery and black ops nature of this thing. It feels like there is some sort of conspiracy here that I've yet to uncover!


There are I think about 5 million opinions from that project, yes. I wouldn't say it's blackops, feel free contact me on reddit.


How much ram does that use up? What’s the latency? Is it sharded? Is it a cluster? So many questions


There are 2 search boxes going. One for storing the search index without source and another which stores the source, which is only used for highlighting. Searches usually take under 200ms and SRP and individual pages usually take less than 20ms. The 2 ES nodes are not formally part of a single cluster due to the index storage difference. Another box uses a traditional LAMP setup. Feel free to send a message on reddit if interested in more detail.


How large is the index?

How do you manage that between RAM or SSD?


Search - Index is ~373GB. AMD Epyc 7371 - 16c/32t - 3.1 GHz/3.8 GHz. 512 GB ECC 2400 MHz. 2×1.92 TB SSD NVMe

Highlight - Index is ~620GB. Xeon-D 2141I - 8c/16t - 2.2 GHz/3 GHz. 64 GB ECC 2133 MHz. 2×1.92 TB SSD NVMe

Search and highlighting handled async from queue.


Awesome insight, and site. Thanks


[flagged]


So a public records search amounts to that?


Who made this site? How is it funded? They don't reveal themselves afaict. Why should I trust it?


The site is meant to be an index, and you should verify information from the source.


What wouldn’t you trust? It’s simply indexing public info.


Think of it like a search engine, like Google: There is a lot of editorial power in a search engine, such as what is listed, what is listed first, what is excluded, and accuracy (correctness and completeness).

Always know your source; there are no exceptions - especially in the 'post-truth' era.


Thus sucks for anyone who has bad stuff in their past and is trying to live it down.

It used to be what happened in the past could be put to rest.

Not good for society.



Can you explain your thought process? How is that related?


Not the original poster but: Your service costs money to run. You're not, so far as I can tell, a recognized non-profit doing this out of the goodness of your heart. Given that you're not running ads, all the monetization angles are yucky.

The search query stream of your site alone is immediately monetizable for targeting by "legitimate businesses" who sell "reputation management services." (People who expect to find records are searching for their own names. People who suspect others might have records are searching for those names. Source: see every second comment on this topic) Notably, nothing in your terms suggests this isn't already your business model - you don't need to collect or retain personal information about the visitors to your site to monetize the search query stream. In fact, it's probably "better" not to collect PII, because then you can claim that the search queries are "just strings the visitor provided voluntarily" and that you have no idea whose names those are.

Also: your dataset is littered with records that are properly under seal, including juvenile records. You are NOT relieved of potential liability for disseminating these just because they're included in your dragnet, should they be found through your service and used in a way that causes actual damages.

I'd strongly caution anyone against using this site at all, much less searching names. This site is plausably an input to an extortion machine at scale.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: