Hacker News new | past | comments | ask | show | jobs | submit login
My 2024 AI Predictions (axflow.dev)
54 points by nichochar on Jan 8, 2024 | hide | past | favorite | 34 comments



> Unstructured document parsing

If I had to invest in any one area of LLM usage, it would be this. There is so much unstructured data in the world, and converting things like legal contracts or chatlogs into structured, queryable data is absurdly powerful. Nobody wants to talk about this usage for LLMs because they're too busy making TikToks about how GPT4 actually has a soul or whatever, but this will be the lasting legacy of LLMs after the hype around generative AI dies out.

> A decent engineer will likely be able to write a slack-like application, definitely good enough to cancel the 500k/year contract, in a couple of months.

And this is why generative AI is massively overhyped: the people hyping it don't understand the true value of the products they allegedly replace. Very similar to the crypto/blockchain hype where people who understood nothing about banking or logistics insisted that blockchain would solve all the problems there. If you think a corp is paying Slack $500k/year because it's hard to write a piece of software that can send messages between people in an organization, you're completely off base. (IRC exists, can do this and is free by the way.)


> If I had to invest in any one area of LLM usage, it would be this. There is so much unstructured data in the world, and converting things like legal contracts or chatlogs into structured, queryable data is absurdly powerful.

I tested poorly OCR'd text from a late 1800s magazine and was pretty impressed with the results from both GPT-4 and even Bard. In addition to making inferences based on the letters, it was able to infer from context the historically accurate terms. However, one prompt asking it to correct words didn't work as well as two prompts with the first one listing candidates for bad OCR'd words.

While that might not seem like what you're talking about, it has the benefit of adding to the overall corpus that the future models are trained on. Also, with the NYT lawsuit, I'd presume fixing OCRs of old magazines and articles would be a pretty good way to fill the gap left behind.


There is so much unstructured data in the world

Can you give a couple of specific examples? There are already sites like pdf.ai that let users chat with documents, are you thinking of something different?


The point isn't to have users chat with documents, it's to automatically parse unstructured data into structured data and store that somewhere for later use. As a real example, I deal with piles of legal contracts from which I need to extract specific information so I can perform later analysis, to be able to answer questions like "what specific model of widget is under contract here" and "what's the average contract value" and "what's the average contract value of widget XYZ". All stuff that's very easy to answer in SQL[0], once you have the data -- but extracting that data from legal documents, many of which are not in English, previously required a small army of contractors. Now it's been replaced with a local LLM that parses those relevant contract details into JSON which gets stored into a database. The accuracy is suitable for my use case, though not 100%.

[0] and very hard to answer with an LLM, as they are notoriously awful at doing any sort of math.


Thank you, I understand your use case better.

If possible, could you talk a bit about your process for training your model? It looks like it is specific to legal documents, how easy/hard would it be to do the same for other types of documents?

Also, what level of accuracy is good enough, for your use case?


The beauty of it is there's no model training involved: it's quite literally a prompt that reads something like "given the following document, output JSON that contains the following information: a field name `"widget"` that contains widget name, etc..." then including the doc right into the prompt. This was written a while ago so it just iterates in a python script dividing the source document up and aggregating to get around context length limits. Extremely simple, but works great.

I don't have a really specific accuracy target but since my main interest is in the aggregated results it's not a huge deal if the aggregations are slightly off. (The manual human approach is not 100% accurate either, after all; it's extremely common for humans to make data entry mistakes.)


Real estate's a big one. If you wanted to find publicly-available historical information about who owned a house and how many times it changed hands, the format of the data will change completely from state to state and country to country. An LLM that can rip through millions of these documents (both digital and on paper with GPT-4V) and generate clean, normalized and structured JSON will be absolutely huge.

Healthcare's another. Exchanging information between benefits brokers is madness because of inconsistent data formats.

A more consumer-facing example: exporting/joining data from one or more apps to migrate into another app. This is a hellish process most of the time that an LLM could, theoretically, reduce to seconds (not counting execution time).


An example would be if you had a large article of text recording the history of a given subject, and you wanted a table of years and the number of times a given event happened in each year. It's now possible to do a task in less than a minute that which used to take hours.


Some of these seem reasonable but I disagree with this:

> A decent engineer will likely be able to write a slack-like application, definitely good enough to cancel the 500k/year contract, in a couple of months.

A decent engineer can already crank out a working Slack prototype within a couple of months, and there are mature Slack alternatives today. There's a reason companies are paying $500k/year, and I doubt it's the code: maybe it's the enterprise support, the external integrations, or even just the name recognition.

Companies getting leaner may be true (it seems like this has already been happening the past couple years regardless of AI, and companies used to be lean in the 2010s).


While yes, an engineer could build a slack clone. Probably initially quite poor and lacking features. If you're not in the business of building chat applications, having to actually maintain such an application becomes a burden. While you may save 500k for a few years. A few years down the road, when said engineer leaves, you will end up having to pay the cost either to exit said app, or spend a lot more engineering effort on it.

There definitely is room for such applications, but a chat application probably won't set the business apart.


Totally agreed. If companies didn’t want to pay for Slack they would have jumped to the alternatives already.


Has anyone had any success in code generation? I feel like chatgpt usually completely fails to write even a small function correctly unless it's a very trivial or well known problem. I usually have to go back and forth for a good long while explaining all the different bugs to it, and even then it often doesn't succeed (but often claims it's fixed the bugs). The types of things it gets wrong makes it a bit hard to believe it could improve enough to really boost dev productivity this year.


Hi, author here.

This is a pretty hard problem. And I haven't found anyone that's too good at this, but here are some interesting players:

- https://www.phind.com/ is a custom model fine-tuned on code, and pretty damn good - https://codestory.ai is a VSCode fork with an assistant built in. One of the things it does for you is write code, but imo that's not its biggest strength yet. - https://sweep.dev have a bot where you create a GitHub comment and it writes the PR to fix it. They have between 30% and 70% success rate. This is pretty bad but they're one of the best today - https://sourcegraph.com is pivoting and building a copilot application (named Cody). This is pretty good, since sourcegraph is great at understanding your code


Have you tried Cody (https://cody.dev)? Cody has a deep understanding of your codebase and generally does much better at code gen than just one-shotting GPT4 without context.

(disclaimer: I work at Sourcegraph)


> A decent engineer will likely be able to write a slack-like application, definitely good enough to cancel the 500k/year contract, in a couple of months.

People are rightfully calling out this bit. It still wouldn't make sense for a Slack customer to make their own version of Slack in-house, but it does lower the bar for a lot of Slack competitors to get to feature parity much faster.


Also if this were true, we'd see this happening with existing platforms like Rocket Chat and Zulip. And likewise we should see velocity of open-source projects skyrocket.


I am unable to click any of the links on this article, when reading with Safari iOS


It gets the whole page selected instead, maybe some js to prevent copying it but poorly configured.


On desktop there is a script running which, at the very least, tries to mimic custom styling of selecting text...for apparently absolutely no reason at all. I bet that script is configured for mouse events but not touch events. Very silly.


Fixed, flex and z-index issue on my part, sorry


Same with both Firefox and Chrome on Android.


Is the author famous for correctly predicting anything before?


Does this count as a form of appeal to authority?


Logical fallacies apply to structured, proof-like logical arguments. Many can still be useful heuristics.


I predicted that if I made the front page of HN someone might be a little toxic about me!


> I predict non-smartphone AI devices will fail. The AI device of the future is likely an iPhone or android phone with a dedicated GPU chip for AI.

I go back and forth on this. While I see this being the case for data collection wearables like humane or tab, it makes sense to have a personal AI computer like bedrock [0], tinybox [1], or a mac studio for running background tasks on personal data. If you're running agents that do more than chat, you need something that's going to be able to handle doing inference for extended periods of time without worrying about heat or battery life. You likely also want something capable of doing fine-tune level training on your personal inputs. A lot of the more interesting use-cases are on data you probably don't want to expose to a cloud provider. That said, probably Apple is eventually going to crush here as well, but maybe there's room for a challenger to develop as this niche opens up.

[0]: https://www.bedrock.computer/gal [1]: https://tinygrad.org


Google & Coral AI already has a $25 usb powered tensor inferencer that can do 4tops.

I don't think it will take much more than that to get the rest of the way.


I think distributed TinyML(aka AIoT) with multiple Oura-like wearables on the body for (near)total health monitoring is al likely contender.


Am I the only one who does not immediately see a quality difference between the two photos in the embedded tweet?


I'm afraid yes. The left one is much flatter and is clearly a drawing, while the right one looks almost a photograph.


> I personally regularly use the “voice” version of chatGPT to brainstorm with it while I walk my dog. We sped past the Turing test so fast that no one even beat an eyelash about it

I don't think that just because the author has a pseudo-conversation with ChatGPT using voice as the interface means we've passed the Turing test.

They don't seem to be actively interrogating ChatGPT to determine whether it's a human or not - something that I'd expect would still be quite easy to do. And, as I understand it, the Turing test could be administered over text.


The truth is that the Turing test turned out to be useless. Whether we have passed it or not has no bearing on my life or anyone else's. The way I talk to ChatGPT isn't the way I talk to a real person, despite it already being capable of communicating with human language, teaching me things, and helping with my work and daily life. No real person would tolerate a turn-by-turn exchange of 2 minute monologues, but that's (apparently) what I want from an AI.

And millions of people are fooled into thinking GPT is a real person every day, with spam and robocalls and social media bots. Maybe it won't fool everyone all the time, but it can fool some people a lot of the time. And it's only going to get more sophisticated. The only ones concerned about the Turing test are 70 year old GOFAI professors -- everyone else is dealing with the practical realities of computers suddenly having language capabilities.


Speech to text is still pretty poor for live voice input into a computer. That's a significant different use case than simple transcription.


Jan 2024: Still waiting for a decent package that can take context of how my previous test cases are written and write new ones




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: