Hacker News new | past | comments | ask | show | jobs | submit | saithound's comments login

Cool. Since ChatGPT 4o is actually really good at this particular shape identification task, what, if anything do you conclude about its intelligence?

Recognizing triangles isn't that impressive. What's the ceiling of complexity of patterns in data it can identify with is the real question. Give it a list of randomly generated xyz coords that fall on a geometric shape, or a list of points that sample a trajectory of Earth around Sun. Will it tell you that it's an ellipse? Will it derive the 2nd Newton's law? Will it notice the deviation from the ellipse and find the rule explaining it?

The entire point here is that LLMs and image recognition software is not managing this task, so, not really good at this particular shape identification task.

No, the post's article is not about the sort of shape identification task discussed by GP. Or indeed any image recognition task: it's a paper about removed context in language.

Fwiw, I did test GP's task on ChatGPT 4o directly before writing my comment. It is as good at it as any human.


That's _exactly_ what the LLM did: the article's authors decided to count that as a failure.

Hm was reading only TFA not the research paper. But TFA mentions this :

  Perhaps the most unsettling finding is what failure looks like. Even when models are completely wrong, they sound persuasive. The reasoning is fluent, the explanations are structured, and the conclusions are confidently delivered. But the logic doesn’t hold.

That sounds a lot like a salesperson. And yes, there is a human tendency to twist reasoning to make the written word look polished, and I don’t think LLM training has fixed that bias.

It's very regional.

Buses meant for right-hand drive markets like the UK, Australia and Japan are (with very few exceptions) shaky, low-entry configuration two door junk, or double-decker one door junk.

If that's what you're used to, even the most rickety light rail system will feel luxurious by comparison.

Left-hand drive low-floor buses with three doors, and articulated models with four doors, which are intended for the European market, are as a general rule much more comfortable. If the buses you normally take fit the latter description, you'll probably find the average tram worse.


Needs (2024) in the title.


Since developers working a tplaces like Parkes Observatory use LLMs regularly, it seems like experience ("13-year-olds" versus "senior engineers" at the two extremes) doesn't explain this gap as well as you imply.

The other hypotheses in this thread (e.g. that it's largely a matter of programming language) seem much more plausible.


When it comes to full on vibe coding (Claude Code with accept all edits), my criteria is whether I will be held responsible for the complexity introduced by the code. When I've been commissioned to write backend APIs, "the buck stops with me" and I will have to be able to personally explain potentially any architectural decision to technical people. On the other hand, for a "demo-only" NextJS web app that I was hired to do for non-technical people (meaning they won't ever look at the code), I can fully vibe code it. I don't even want to know what the complexity and decisions AI has made for me but as far as I am concerned this will be a secret forever.


Everyone can use these tools to deepen knowledge and enhance output.

But, there is a difference between using LLMs and relying on LLMs. The hype is geared toward this idea that we can rely on these tools to do all the work for us, we can fire everyone, but its bollocks.

It becomes an increasingly ridiculous proposition as the work becomes more specialized indepth, cross functional, regulated and critical.

You can use it to help at any level of complexity, but nobody is going to vibe code a flight control system.


FWIW I fully agree.


I don't even lean toward the worst-case AI narratives, but it sure feels like Economist journos will keep pushing our "here's why AI wont take your job" articles, even as their own writers get quietly pushed out by ChatGPT creeping across their open-plan office one desk at a time.

In this piece, they lean heavily on precious "official American data", and celebrate the increased number of people working in translation, while conveniently ignoring more telling figures, such as the total amount those translators actually earn now per unit of work.

My partner works in university administration, and their "official data" tells a much spicier story. Their university still ranks highest in our country for placing computer engineering grads within six months of graduation. But over just six terms, the number of graduates in employment within six months dropped by half. That's not a soft decline by any means, more like the system breaking in real time.


I'm on the other side of the fence in BigCo and from where I'm sitting we're not hiring any Americans because we're in the middle of the biggest outsourcing push I've ever seen.

The take seems to be "if your job can be done from Lake Tahoe it can be done from Bangalore". What's different this time around is the entire tech organization is being outsourced leadership and all. Additionally, Data Science and other major tech-adjacent roles are also affected.

For us, our hiring rate for tech and tech-adjacent rolls has been zero in one US for several years. 100% is attributable to outsourcing. 0% to AI.


I'm seeing the same thing. We have a formal IT hiring freeze, all jobs are moving overseas. However, AI has not been eating the jobs, just traditional outsourcing.


If we were in the boom times, less hiring would be a convincing signal. But the global economist is toast right now. There are very very good reasons not to hire engineers, and it’s plausible AI has nothing to do with it.

Anecdotally there’s no way AI has enabled me to replace a junior hire.

AI has major problems, although it’s a fantastic tool, right now I’m seeing a boost similar to the emergence of stack overflow. That might increase but even then we may just see higher productivity.


I think it'll be a Jevons' paradox thing. If developer productivity improves, that just increases the scope of possible projects.


I think recent history has made Solows paradox more interesting than Jevons, which is mostly a thing talked about by people with something to sell related to AI it seems, and less so by economists. Seems to have applied much better early in the industrial revolution. I'm not sure economists even work on Jevons anymore (or if it was ever a very interesting topic for them, the writing on it seems very sparse in comparison).


In either case, developers won't have to worry about career longevity.


But what new projects do you need?

The west is in a world of abundance. We do not need 5 more ChatGPTs. It's better business to have one half price ChatGPT than 3 full priced ones.

Jevon's Paradox requires an very large unmet consumer demand.


I work in network security, specifically on the automation team. As we articulate more and more processes, products, monitoring together, new demand is created, and our scope grows (unlike our team right now).

Being able to automatically write unit tests with minor inputs on my part, creating mocks and sometimes (mostly on front end work or on basic interface) even generating code makes me more 'productive' (not a huge increase but we work with a lot of proprietary stuff), and I'm okay with it. I also use it as a rubber duck under advices from someone on HN and it was a great idea.


> Everything that can be invented has been invented.

-Charles H. Duell, Commissioner of the US Patent Office, circa 1889


It's hard to predict the details of emergent phenomena.


I came here to contradict myself.

A dev camp CEO I know got in touch this week. They are seeing junior devs turned away due to AI.

Yes it’s assistive but the issue with juniors in particular is the entry level tasks they usually done are easier with AI


I think a ton of hidden signals of a waning economy are being obscured by AI and globalization talk. Sure, some attempts to globalize and use more AI are genuine, but those are still cost cutting measures. And we cut costs when things aren't going well. There's just no way a brand new tech has penetrated the market enough to depress every sector of tech- or language-adjacent employment.


There's also something about tarrifs and gutting government investment I've been hearing about from the US. I'm no economist, it's possible that might have something to do with the economy waning.


I do not disagree with your broader point, but its worth noting that The Economist article deliberately framed its analysis around datasets that also wouldn't capture the economic slowdown!

That choice itself is telling.


>even as their own writers get quietly pushed out by ChatGPT

Have any of the Economist's writers been replaced by ChatGPT?


"will keep pushing" is future tense. The entire sentence is correct and has a well defined meaning.


The word "keep" here is synonymous with "continue," implying that this is already happening. It's fair game to ask if that's actually the case.

And this is by the way, but English sentences almost always have some degree of ambiguity; to talk about "well defined meaning" in the context of natural languages is to make a category error.


It's ambiguous as to whether the clause after "as" is in the present tense or future since "will keep pushing" presupposes the action is already happening.


I'm not sure if that can be attributed to AI or the ongoing recession.


Tech layoffs were happening before the AI hype exploded.


ZIRP hangover.


I'm not sure either. I'm pointing out that The Economist is presenting misleading metrics deliberately: based on the "reliable American data" that they chose, you wouldn't see evidence of an ongoing recession either!


Regarding the last point, that doesn’t mean the jobs are replaced by AI though.

A lot of companies aren’t necessarily replacing jobs with AI. They’re opening development offices in Europe, India, and South America.


Do you really think half of these grads arent getting job because of AI replacing coders, In the short time where these coding assistants have been available?

I mean I’ve tried Claude Code - its impressive and could be a great helper and assistant but I still cant see it replacing such a large amount of engineers. I still have to look over the output and see if its not spitting out garbage.

I would guess basic coding will be replaced somewhat, but you still need people who guide the AI and detect problems.


You can’t build a business on scaling out the hiring 1000s of AI experts. You only need so many, which is why they get higher salaries. There will never be an infosys or tata for such workers like there was for many of us mere coders. Infosys and tata will likely benefit, but their average worker will not.


Well these “AI experts” are just senior devs and without once being a junior, youll never become one. So there will be junior devs. They might not grind their teeth on CRUD apps anymore, but we will definitely have them.


I can see current models replacing *fresh graduates*, on the basis of what I've seen from various fresh graduates over the last 20 years.

I don't disagree that models makes a lot of eye-rolling mistakes, it's just that I've seen such mistakes from juniors also, and this kind of AI is a junior in all fields simultaneously (unlike real graduates which are mediocre at only one thing and useless at the rest) and cost peanuts — literally, priced at an actual bag of peanuts.

Humans do have an advantage very quickly once they actually get any real-world experience, so I would also say it's not yet as good as someone with even just 2 years experience — but this comes with a caveat: I'm usually a few months behind on model quality, on the grounds that I think there's not much point paying for the latest and greatest when the best becomes obsolete (and thus free) over that timescale.


These state-of-the-art models are barely able to code an MVP app without tons of hand-holding, you really think new grads are getting replaced by AI ? I only see statements like these coming out of the likes of Elon Musk.


Think that's the problem. The people who have the keys to the money to do the hiring are often off the tools and have no real grounding in the capabilities of the current generation of LLMs. They make decisions about how much to or not to hire based on the junk they see from the Elon Musk types.


This would be a wrong argument even if your premise about Switzetland was factually true (it's not).

It's like praising Danish architecture for its earthquake-resistance since no Danish building ever collapsed in an earthquake. It fails to account for the fact that Denmark never gets any significant earthquakes.

You can't tell how good a system is at resisting descent into authoritarian rule unless wannabe-autocrats have tried several times, amassed some support to achieve their goals, and the democratic institutions held against them. This never happened in Switzerland, not even in the 1930s: the ability of the Swiss constitution to precent authoritarian backsliding is untested.

(But as a side note, what you're saying is not factually correct. The Swiss constitution is from 1848, and before Napoleon only Schwyz, Uri and Unterwalden would be considered nonauthorian. Many cantons, like Bern, were ruled by birthright autocratic families, and had no popular vote whatsoever.)


I'm sorry I haven't answered your comment so far, because, even though it is mistaken in many ways, it seems to be in good faith, and I think it deserves an answer.


CMU professors can't build AI agents, and decide to brag about it. That's the article.

"We tried something, and we couldn't make it work. Therefore it must be impossible to do."

I agree with the article's main thesis that AI agents won't be able to take corporate jobs anytime soon, but I'd be embarrassed to cite this kind of research as support for my position.


It’s not entirely clear from the write up in the article, but it sounds like this was intended as a test of existing “off the shelf” AI agent models. In other words, the aim is to find out what happens if you try to use the existing commercially available technology (which of course is what most people would be doing).


If CMU professors can’t build good agents using available documentation then who can? Not their fault the state of the tooling is what it is.


Levy war, conclude peace, contract alliances, establish commerce. Also, mandate schooling, mandate vaccinations, forbid polluting the environment, set clocks forward, and various other acts.

Governments exist so that we can coordinate doing things that individually we couldn't do or wouldn't wish to do. It's us.

The problem with the tariffs is not that they're decreed by the government that your fellow citizens elected, it's that they're counterproductive to the extent that they're causing a recession.


OpenAI o3 and o4-mini are massive disappointments for me so far. I have a private benchmark of 10 proof-based geometric group theory questions that I throw at new models upon release.

Both new models gave inconsistent answers, always with wrong or fake proofs, or using assumptions that are not in the queation, and are often outright unsatisfiable.

The now inaccessible o3-mini was not great, but much better than o3 and o4-mini at these questions: o3-mini can give approximately correct proof sketches for half of them, whereas I can't get a single correct proof sketch out of o3 full. o4-mini performs slightly worse than o3-mini. I think the allegations that OpenAI cheated FrontierMath have unambiguously been proven correct by this release.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: