Rubbish. I built a pipeline to handle document classification that successfully took care of ~70TB of mostly unstructured and unorganized data, by myself, in a couple weeks, with no data engineering background whatsoever. This was quite literally impossible a couple years ago. The amount of work that saved was massive and is going to save us a shit ton of money on storage costs. Decades worth of invoices and random PDFs are now siloed properly so we can organize and sort them. This was almost intractable a few years ago.
We came up with different categories of tags. I should clarify, the AI didn't actually do the sorting, it did tagging so sorting was tractable. After the tagging it's just a matter of grouping, either by algorithm or human.
But obviously it would be far from accuracy that LLM would be able to do. E.g. generate search keywords, tags, other type of meta data for a certain document.
Yup that's exactly it. By being able to tag things with all sorts of in house meta data we were then able to search and group things extremely accurately. There was still a lot of human in the mix, but this made the whole task going from "idk if we can even consider doing this" to "great, we can break this down and chip away at it over the next few months/throw some interns at it".
Yeah, I don't know - hearing arguments that this was already done by ML algorithms is to me hearing like "moving from place A to B existed already before cars". But it seems like a common sentiment. So much that simple ML attempted to be doing required massive amount of training and training data specific to your domain before you could use it, and LLM can do it out of the box, and actually consider nuance.
I think organizing and structuring data from unorganized data from the past is a massive use case that seems heavily underrated by so many right now. People spend a lot of time on figuring out where to find some data, internally in companies, etc.