Hacker Newsnew | past | comments | ask | show | jobs | submit | Madmallard's commentslogin

The difference people are neglecting to point out is the experiences we have versus the experiences the AI has.

We have at least 5 senses, our thoughts, feelings, hormonal fluctuations, sleep and continuous analog exposure to all of these things 24/7. It's vastly different from how inputs are fed into an LLM.

On top of that we have millions of years of evolution toward processing this vast array of analog inputs.


So, just connect LLMs to lava lamps?

Jokes aside, imagine you give LLMs access to real-time, world-wide satellite imagery and just tell it to discover new patrerns/phenomens and corrrlations in the world.


Are you sure about that one?

What exactly will these agents be able to do with enough consistency, accuracy, and reliability that people will want to hire them over humans?

In my experience with even the most basic implementation of agents, i.e. customer service chat bots, I literally cannot stand interacting with them even once. They are extremely unhelpful and I will hang up or immediately ask to speak to a human.


Obviously your support chatbot with talk to your flavor of clawd that will call Claude Code that will code a solution that will be reviewed by Codex that will merge and release it and then will ping clawd that will send an email to the user announcing that their issue has been fixed. /s just in case

I’ve been involved in building a system that reads structured data from a special form of contracts from a specific industry. Prices, clauses, pick up, delivery, etc. A couple hundred datapoints per contract. We had many discussions around how to present and sell an imperfect system. The thing is, the potential customers are today transcribing the contracts manually and we quickly realized that people make a ton of mistakes doing that. It became obvious when we were working on assertion datasets ourself. It’s not a perfect system and you have to consider how you use the data (aggregating for price indexing for instance), but we’re actually doing better than what people are achieving when they have to transcribe data for hours a day.

The voice agents in development right now feel 100x the current chatbots deployed by companies.

I had same opinion till a few months ago, now would prefer the [redacted company so as to not give free marketing] AI agent. You’ll start seeing this wave in around 3-6 months as most are in trial


Just sounds like gassing because you are invested in it yourself.

Most support agents lack... well, agency. If you connect a chatbot to an FAQ, that's exactly what you get. Just another instance of enterprise software being badly designed, badly written etc. It doesn't mean that it's actually an impossible problem.

They won't ever give agents the ability to actually do things for customers that can impact the company in some kind of negative fashion. At least not willingly.

That's sort of the whole point of talking to customer service though. Getting something done that you want that involves them having to do work for you. AKA you taking value from the company.

So yeah they're basically always going to be useless garbage if put together according to business requirements.

Other services should just be automated already.


They'll do the same thing we do in software development - proper sandboxing, context curation, reviews on high impact actions. I presume real customer service is really expensive, as I've seen many companies prefer to just quickly refund, or drop you as a customer entirely, rather than fix your problem. It can't get much worse than that.

"real games"

Instead they're getting worse yay! Hop on Kalshi

HN: Please post substantive thoughtful replies only in these AI threads.

Also HN: LLM AI generated sloppa with errors even on top of front page.

clown emoji


What's strange about this is that tons of the upvoted posts on the front-page are LLM generated text

So....?


Or the reality, which is that the numbers are royally fudged and the statistic is a farce.


It's not that numbers are a farce but different industry segments are doing better or worse than other.

HN being a tech forum that now increasingly skews East and Midwest (heck, it's not even 7am yet in the West, but look at the degree of engagement on here) means most HNers are impacted by a slowdown in tech hiring, which exacerbates the sense of pessimism.

And tbf, if you aren't working in a tech hub like the Bay or NYC, you are going to be screwed if you are laid off - employers increasingly restrict remote work to those employees who have proven internal track records, and inshoring hubs like in RTP, Denver, Atlanta, etc are on the chopping block.



2. Translating between two different coding languages (migration)

I have a game written in XNA

100% of the code is there, including all the physics that I hand-wrote.

All the assets are there.

I tried to get Gemini and Claude to do it numerous times, always with utter failure of epic proportions with anything that's actually detailed. 1 - my transition from the lobby screen into gameplay? 0% replicated on all attempts 2 - the actual physics in gameplay? 0% replicated none of it works 3 - the lobby screen itself? non-functional

Okay so what did it even do? Well it put together sort of a boilerplate main menu and barebones options with weird looking text that isn't what I provided (given that I provided a font file), a lobby that I had to manually adjust numerous times before it could get into gameplay, and then nonfunctional gameplay that only handles directional movement and nothing else with sort of half-working fish traveling behavior.

I've tried this a dozen times since 2023 with AI and as late as late last year.

ALL of the source code is there every single thing that could be translated to be a functional game in another language is there. It NEVER once works or even comes remotely close.

The entire codebase is about 20,000 lines, with maybe 3,000 of it being really important stuff.

So yeah I don't really think AI is "really good" at anything complex. I haven't really been proven wrong in my 4 years of using it now.


I crave to see people saying "Here's the repo btw: ..." and others trying out porting it over, just so we see all of the ways how AI fails (and how each model does) and maybe in the middle there a few ways to improve its odds. Until it eventually gets included in training data, a bit like how LLMs are oddly good at making SVGs of pelicans on bicycles nowadays.

And then, maybe someone slightly crazy comes along and tries seeing how much they can do with regular codegen approaches, without any LLMs in the mix, but also not manual porting.


Agreed -- coding agents / LLMs are definitely imperfect, but it's always hard to contextualize "it failed at X" without knowing exactly what X was (or how the agent was instructed to perform X)


I'm sure someone who regularly programs games in the destination language I want who also has worked with XNA in the past as a game developer could port it in a week or something yeah


- Split it in different modules / tasks

- Do not say: "just convert this"

- On critical sections you do a method-per-method-translation

- Dont forget: your 20.000 lines source at a whole will make any model to be distracted on longer tasks (and sessions, for sure)

- Do dedicated projects within Claude per each sub-module


This matches my experience. Unless it's been done to death online (crud etc) it falls on its face every time.


It's okay shills and those with stock in the AI copies brainwashed themselves inside out and will spam forever on this website that if you just introduce the right test conditions agents can do anything. Never mind engineering considerations if it passes the tests it's good homie! Just spent an extra few hundred or thousand a month on it! Especially on the company I have stock in! Give me money!


Yes, you are right: amongst the four points, migration is the most contentious one. You need to be fairly prudent about migration and depending on the project complexity, it may or may not work.

But I do feel this is a solvable problem long term.


In those situations you basically need to guide llm to do it properly. It rarely one shots complex problems like this, especially in non web dev, but could make it faster than doing it manually.


Oh believe me I broke it down super finely, down to single files and even single functions in some places

It still is completely and utterly hopeless


I've done this multiple times in various codebases, both medium sized personal ones (approx 50k lines for one project, and a smaller 20k line one earlier) and am currently in the process of doing a similar migration at work (~1.4 million lines, but we didn't migrate the whole thing, more like 300k of it).

I found success with it pretty easily for those smaller projects. They were gamedev projects, and the process was basically to generate a source of truth AST and diff it vs a target language AST, and then do some more verifier steps of comparing log output, screenshot output, and getting it to write integration tests. I wrote up a bit of a blog on it. I'm not sure if this will be of any use to you, maybe your case is more difficult, but anyway here you go: https://sigsegv.land/blog/migrating-typescript-to-csharp-acc...

For me it worked great, and I would (and am) using a similar method for more projects.


"I also wanted to build a LOT of unit tests, integration tests, and static validation. From a bit of prior experience I found that this is where AI tooling really shines, and it can write tests with far more patience that I ever could. This lets it build up a large hoard of regression and correctness tests that help when I want to implement more things later and the codebase grows."

The tests it writes in my experience are extremely terrible, even with verbose descriptions of what they should do. Every single test I've ever written with an LLM I've had to modify manually to adjust it or straight up redo it. This was as recent as a couple months ago for a C# MAUI project, doing playwright-style UI-based functionality testing.

I'm not sure your AST idea would work for my scenario. I'd be wanting to convert XNA game-play code to PhaserJS. It wouldn't even be close to 95% similar. Several things done manually in XNA would just be automated away with PhaserJS built-ins.


Ya I could see where framework patterns and stuff will need a lot of corrections in post after that type of migration. For mine it was the other direction and only the server portion (Express server written in typescript for a Phaser game, and porting to Kestrel on C#, which was able to use pretty much identical code and at the end after it was done I just switch and refactor ba few things to make it more idiomatic C#).

For the tests, I'm not sure why we have such different results but essentially it took a codebase I had no tests in, and in the port it one shot a ton of tests that have already helped me in adding new features. My game server for it runs in kubernetes and has a "auto-distribute" system that matches players to servers and redistributes them if one server is taken offline. The integration tests it wrote for testing that auto-distribute system found a legit race condition that was there in both the old and new code (it migrated it accurately enough that it had the same bugs) and as part of implementing that test it fixed the bug.

Of course I wouldn't use it if it wasn't a good tool but for me the difference between doing this port via this method versus doing it manually in prior massive projects was such an insane time save that I would have been crazy to do it any other way. I'm super happy with the new code and after also getting the test infra and stuff like that up it's honestly a huge upgrade from my original code that I thought I had so painstakingly crafted.


super cool, don't have the time to read it right now but to think in terms of ASTs is pretty handy!


The only model that works well for complex things is Opus, and even then barely (but it does and you need to use api/token pricing if you want guarantee it’s the real thing).


The solution to fostering relationships is getting off the computer and your phone and social media for a prolonged period of time.


Yes!


They are all guilty.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: