Hacker Newsnew | past | comments | ask | show | jobs | submit | thijser's commentslogin

This reminds me of when I tried to let Claude port an Android libgdx-based game to a WASM-based libgdx version, so I can play the game in the browser.

No matter how much I tried to force it to stick to a mostly line-by-line port, it kept trying to "improve" the code. At some point it had to undo everything as it introduced a number of bugs. I asked it: "What should I add to your prompt so you won't do this again?" and it gave me this:

  ### CRITICAL LESSON: Don't "Improve" During Porting
  -  **BIGGEST MISTAKE: Reorganizing working code**
    - **What I did wrong:** Tried to "simplify" by splitting `createStartButton()` into separate creation and layout methods
    - **Why it failed:** Introduced THREE bugs:
      1. Layout overlap (getY() vs getY() - getHeight())
      2. Children not sized (Group.setSize() doesn't affect children)
      3. Origins not updated (scaling animations broken)
    - **The fix:** Deleted my "improvements" and copied the original Android pattern faithfully
    - **Root cause:** Arrogance - assuming I could improve production-tested code without understanding all the constraints
    - **Solution:** **FOLLOW THE PORTING PRINCIPLES ABOVE** - copy first, don't reorganize
    - **Time wasted:** ~1 hour debugging self-inflicted bugs that wouldn't exist if I'd just copied the original
    - **Key insight:** The original Android code is correct and battle-tested. Your "improvements" are bugs waiting to happen.

I like the self-reflection of Claude, unfortunately even adding this to CLAUDE.md didn't fix it and it kept taking wrong turns so I had to abandon the effort.

Claude doesn't know why it acted the way it acted, it is only predicting why it acted. I see people falling for this trap all the time

It's not even predicting why it acted, it's predicting an explanation of why it acted, which is even worse since there's no consistent mental model.

It had been shown that LLMs don't know how they work. They asked a LLM to perform computations, and explain how they got to the result. The LLM explanation is typical of how we do it: add number digit by digit, with carry, etc... But by looking inside the neural network, it show that the reality is completely different and much messier. None of it is surprising.

Still, feeding it back its own completely made up self-reflection could be an effective strategy, reasoning models kind of work like this.


Right. Last time I checked this was easy to demonstrate with word logic problems:

"Adam has two apples and Ben has four bananas. Cliff has two pieces of cardboard. How many pieces of fruit do they have?" (or slightly more complex, this would probably be easily solved, but you get my drift.)

Change the wordings to some entirely random, i.e. something not likely to be found in the LLM corpus, like walruses and skyscrapers and carbon molecules, and the LLM will give you a suitably nonsensical answer showing that it is incapable of handling even simple substitutions that a middle schooler would recognize.


The explanation becomes part of the context which can lead to more effective results in the next turn, it does work, but it does so in a completely misleading way

Which should be expected, since the same is true for humans. The "adding numbers digit by digit with carry" works well on paper, but it's not an effective method for doing math in your head, and is certainly not how I calculate 14+17. In fact I can't really tell you how I calculate 14+17 since that's not in the "inner monologue" part of my brain, and I have little introspection in any of the other parts

Still, feeding humans their completely made-up self-reflection back can be an effective strategy


The difference is that if you are honest and pragmatic and someone asked you how you added two numbers, you would only say you did long addition if that's what you actually did. If you had no idea what you actually did, you would probably say something like "the answer came to me naturally".

LLMs work differently. Like a human, 14+17=31 may come naturally, but when asked about their though process, LLMs will not self-reflect on their condition, instead they will treat it like "in your training data, when someone is asked how he added number, what follows?", and usually, it is long addition, so that is the answer you will get.

It is the same idea as to why LLMs hallucinate. They will imitate what their dataset has to say, and their dataset doesn't have a lot of "I don't know" answers, and a LLM that learns to answer "I don't know" to every question wouldn't be very useful anyways.


>if you are honest and pragmatic and someone asked you how you added two numbers, you would only say you did long addition if that's what you actually did. If you had no idea what you actually did, you would probably say something like "the answer came to me naturally".

To me that misses the argument of the above comment. The key insight is that neither humans nor LLMs can express what actually happens inside their neural networks, but both have been taught to express e.g. addition using mathematical methods that can easily be verified. But it still doesn't guarantee for either of them not to make any mistakes, it only makes it reasonably possible for others to catch on to those mistakes. Always remember: All (mental) models are wrong. Some models are useful.


Life lesson for you: the internal functions of every individual's mind are unique. Your n=1 perspective is in no way representative of how humans as a category experience the world.

Plenty of humans do use longhand arithmetic methods in their heads. There's an entire universe of mental arithmetic methods. I use a geometric process because my brain likes problems to fit into a spatial graph instead of an imaginary sheet of paper.

Claiming you've not examined your own mental machinery is... concerning. Introspection is an important part of human psychological development. Like any machine, you will learn to use your brain better if you take a peek under the hood.


> Claiming you've not examined your own mental machinery is... concerning

The example was carefully chosen. I can introspect how I calculate 356*532. But I can't introspect how I calculate 14+17 or 1+3. I can deliberate the question 14+17 more carefully, switching from "system 1" to "system 2" thinking (yes, I'm aware that that's a flawed theory), but that's not how I'd normally solve it. Similarly I can describe to you how I can count six eggs in a row, I can't describe to you how I count three eggs in a row. Sure, I know I'm subitizing, but that's just putting a word on "I know how many are there without conscious effort". And without conscious effort I can't introspect it. I can switch to a process I can introspect, but that's not at all the same


Yes, this pitfall is a hard one. It is very easy to interpret the LLM in a way there is no real ground for.

It must be anthropomorphization that's hard to shake off.

If you understand how this all works it's really no surprise that reasoning post-factum is exactly as hallucinated as the answer itself and might have very little to do with it and it always has nothing to do with how the answer actually came to be.

The value of "thinking" before giving an answer is reserving a scratchpad for the model to write some intermediate information down. There isn't any actual reasoning even there. The model might use information that it writes there in completely obscure way (that has nothing to do what's verbally there) while generating the actual answer.


That's because when the failure becomes the context, it can clearly express the intent of not falling for it again. However, when the original problem is the context, none of this obviousness applies.

Very typical, and gives LLMs the annoying Captain Hindsight -like behaviour.


IDK how far AIs are from intelligence, but they are close enough that there is no room for anthropomorphizing them. When they are anthropomorphized its assumed to be a misunderstanding of how they work.

Whereas someone might say "geeze my computer really hates me today" if it's slow to start, and we wouldn't feel the need to explain the computer cannot actually feel hatred. We understand the analogy.

I mean your distinction is totally valid and I dont blame you for observing it because I think there is a huge misunderstanding. But when I have the same thought, it often occurs to me that people aren't necessarily speaking literally.


This is a sort of interesting point, it's true that knowingly-metaphorical anthropomorphisation is hard to distinguish from genuine anthropomorphisation with them and that's food for thought, but the actual situation here just isn't applicable to it. This is a very specific mistaken conception that people make all the time. The OP explicitly thought that the model would know why it did the wrong thing, or at least followed a strategy adjacent to that misunderstanding. He was surprised that adding extra slop to the prompt was no more effective than telling it what to do himself. It's not a figure of speech.

A good time to quote our dear leader:

> No one gets in trouble for saying that 2 + 2 is 5, or that people in Pittsburgh are ten feet tall. Such obviously false statements might be treated as jokes, or at worst as evidence of insanity, but they are not likely to make anyone mad. The statements that make people mad are the ones they worry might be believed. I suspect the statements that make people maddest are those they worry might be true.

People are upset when AIs are anthropomorphized because they feel threatened by the idea that they might actually be intelligent.

Hence the woefully insufficient descriptions of AIs such as "next token predictors" which are about as fitting as describing Terry Tao as an advanced gastrointestinal processor.


I'm not threatened by the idea that LLMs might actually be intelligent. I know they're not.

I'm threatened by other people wrongly believing that LLMs possess elements of intelligence that they simply do not.

Anthropomorphosis of LLMs is easy, seductive, and wrong. And therefore dangerous.


The comment you replied to made a point that, if you accept it (which you probably should), makes that PG quote inapplicable here. The issue in this case is that treating the model as though it has useful insight into its own operation - which is being summarized as anthropomorphizing - leads to incorrect conclusions. It’s just a mistake, that’s all.

There's this underlying assumption of consistency too - people seem to easily grasp that when starting on a task the LLM could go in a completely unexpected direction, but when that direction has been set a lot of people expect the model to stay consistent. The confidence with which it answers questions plays tricks on the interlocutor.

Whats not a figure of speech?

I am speaking general terms - not just this conversation here. The only specific figure of speech I see in the original comment is "self reflection" which doesn't seem to be in question here.


some models are capable of metacognition. i've seen Anthropic's research replicated.

Can you elaborate on what you mean by metacognition and where you’ve seen it in Anthropic’s models?

It’s not even doing that. It’s just an algorithm for predicting the next word. It doesn’t have emotions or actually think. So, I had to chuckle when it said it was arrogant. Basically, it’s training data contains a bunch of postmortem write ups and it’s using those as a template for what text to generate and telling us what we want to hear.

Worth pointing out that your IDE/plugin usually adds a whole bunch of prompts before yours - let alone the prompts that the model hosting provider prepends as well.

This might be what is encouraging the agent to do best practices like improvements. Looking at mine:

>You are a highly sophisticated automated coding agent with expert-level knowledge across many different programming languages and frameworks and software engineering tasks - this encompasses debugging issues, implementing new features, restructuring code, and providing code explanations, among other engineering activities.

I could imagine that an LLM could well interpret that to mean improve things as it goes. Models (like humans) don't respond well to things in the negative (don't think about pink monkeys - Now we're both thinking about them).


It's also common for your own CLAUDE.md to have some generic line like "Always use best practices and good software design" that gets in the way of other prompts.

For anything large like this, I think it's critical that you port over the tests first, and then essentially force it to get the tests passing without mutating the tests. This works nicely for stuff that's very purely functional, a lot harder with a GUI app though.

The same insight can be applied to the codebase itself.

When you're porting the tests, you're not actually working on the app. You're getting it to work on some other adjacent, highly useful thing that supports app development, but nonetheless is not the app.

Rather than trying to get the language model to output constructs in the target PL/ecosystem that go against its training, get it to write a source code processor that you can then run on the original codebase to mechanically translate it into the target PL.

Not only does this work around the problem where you can't manage to convince the fuzzy machine to reliably follow a mechanical process, it sidesteps problems around the question of authorship. If a binary that has been mechanically translated from source into executable by a conventional compiler inherits the same rightsholder/IP status as the source code that it was mechanically translated from, then a mechanical translation by a source-to-source compiler shouldn't be any different, no matter what the model was trained on. Worst case scenario, you have to concede that your source processor belongs to the public domain (or unknowingly infringed someone else's IP), but you should still be able to keep both versions of your codebase, one in each language.


One thing that might be effective at limited-interaction recovery-from-ignoring-CLAUDE.md is the code-review plugin [1], which spawns agents who check that the changes conform to rules specified in CLAUDE.md.

[1] https://github.com/anthropics/claude-code/blob/main/plugins/...


I recently did a c++ to rust port with Gemini and it was basically a straight line port like I wanted. Nearly 10k lines of code too. It needed to change a bit of structure to get it compiling, but that's only because rust found bugs at compile time. I attribute this success to the fact my team writes c++ stylistically close to what is idiomatic rust, and that generally the languages are quite similar. I will likely do another pass in the future to turn the callback driven async into async await syntax, but off the bat it largely avoided doing so when it would change code structure.

It's not context-free (haha) but a trick you can try is to include negative examples into the prompt. It used to be an awful trick originally because of Waluigi Effect but then became a good trick, and lately with Opus 4.5 I haven't needed to do it that much. But it did work once. e.g. like take the original code and supply the correct answer and the wrong answers in the prompt as examples in Claude.MD and then redo.

If it works, do share.


Humans act the same way.

For all the (unfortunately necessary) conversations that have occurred over the years of the form, "JavaScript is not Java—they're two different languages," people sometimes go too far and tack on some remark like, "They're not even close to being alike." The reality, though, is that many times you can take some in-house package (though not the Enterprise-hardened™ ones with six different overloads for every constructor, and four for every method, and that buy hard into Java (or .NET) platform peculiarities—just the ones where someone wrote just enough code to make the thing work in that late-90's OOP style associated with Java), and more or less do a line-by-line port until you end up with a native JS version of the same program, which with a little more work will be able to run in browser/Node/GraalJS/GJS/QuickJS/etc. Generally, you can get halfway there by just erasing the types and changing the class/method declarations to conform to the different syntax.

Even so, there's something that happens in folks' brains that causes them to become deranged and stray far off-course. They never just take their program, where they've already decomposed the solution to a given problem into parts (that have already been written!), and then just write it out again—same components, same identifier names, same class structure. There's evidently some compulsion where, because they sense the absence of guardrails from the original language, they just go absolutely wild, turning out code that no one would or should want to read—especially not other programmers hailing from the same milieu who explicitly, avowedly, and loudly state their distaste for "JS" (whereby they mean "the kind of code that's pervasive on GitHub and NPM" and is so hated exactly because it's written in the style their coworker, who has otherwise outwardly appeared to be sane up to this point, just dropped on the team).


Was this Claude Code? If you tried it with one file at a time in the chat UI I think you would get a straight-line port, no?

Edit: It could be because Rust works a little differently from other languages, a 1:1 port is not always possible or idiomatic. I haven't done much with Rust but whenever I try porting something to Rust with LLMs, it imports like 20 cargo crates first (even when there were no dependencies in the original language).

Also Rust for gamedev was a painful experience for me, because rust hates globals (and has nanny totalitarianism so there's no way to tell it "actually I am an adult, let me do the thing"), so you have to do weird workarounds for it. GPT started telling me some insane things like, oh it's simple you just need this rube goldberg of macro crates. I thought it was tripping balls until I joined a Rust discord and got the same advice. I just switched back to TS and redid the whole thing on the last day of the jam.


> rust hates globals

Rust has added OnceCell and OnceLock recently to make threadsafe globals a lot easier for some things. it's not "hate", it just wants you to be consistent about what you're doing.


That’s a terrible prompt, more focused on flagellating itself for getting things wrong than actually documenting and instructing what’s needed in future sessions. Not surprising it doesn’t help.

Sonnet 4.5 had this problem. Opus 4.5 is much better at focusing on the task instead of getting sidetracked.

I wish there was a feature to say "you must re-read X" after each compaction.

Some people use hooks for that. I just avoid CC and use Codex.


Getting the context full to the point of compaction probably means you're already dealing with a severely degraded model, the more effective approach is to work in chunks that don't come close to filling the context window

The problem is that I'm not always using it interactively. I'll give it something that I think is going to be a simple task and it turns out to be complex. It overruns the context, compacts, and the starts doing dumb things.

There's no PostCompact hook unfortunately. You could try with PreCompact and giving back a message saying it's super duper important to re-read X, and hope that survives the compacting.

What would it even mean to "re-read after a compaction"?

To enter a file into the context after losing it through compaction.

Tangential but doesn't libgdx have native web support?

It doesn't seem very bound by CLAUDE.md

libGDX, now that's a name I haven't heard in a while.

Well its close to AGI, can you really expect AGI to follow simple instructions from dumbos like you when it can do the work of god?

as an old coworker once said, when talking about a certain manager; That boy's just smart enough to be dumb as shit (The AI, not you; I don't know you well enough to call you dumb)

I wanted to know more about current web speech recognition APIs. I tried three different coding agents to set up a proof of concept app for me.

To my surprise, Cursor's composer model beat both Claude and Gemini.



At this point I only included accounts with >20,000 followers and the community page lists the 20 most 'central' users to each community even if there are more people included in the community.

"Internet Weirdos" sounds kind of negative, sorry about that, I'll see if the LLM can come up with something better. Or do you have a suggestion of what is a good name for your community?


"Extremist Voices" is worse! Looks like your your LLM meant "researchers studying extremism".


The site is unreachable for me?

Great to see so many good analytics tools pop up for bsky, I'm also working on one that primarily focuses on the fastest growing user accounts if you're interested: https://blue.facts.dev/trending


Now it's working again, great tool. Would also be great to see the growth of the feeds within the category view.


thank you!


Great site. Feature request: two essential categories for our family are missing: princesses and unicorns.


It seems the apps starting to enforce upgrades to pro in ~February of this year.

On the AppBrain page you can see the rating nosedive from 4.6 stars (out of 5) to less than 2: https://www.appbrain.com/app/simple-gallery/com.simplemobile...


Developer's response to several of the negative reviews complaining about their data being held hostage:

> Hey, it is just a tiny one time payment, you will never have to pay again :) lf you uninstall the paid app within 2 hours, you are automatically refunded. If you want a refund anytime later just contact us

While I understand and respect the developer's desire to monetize, creating a set of expectations and then pushing an update to require payment for accessing local data feels like ransom. Have to be careful with the trust users place in you.


Data hostage? All of these apps feature easy and painless import/export of data to an open format.


I do not have personal experience and it appears that the f-droid versions have no anti-features, but those comments at Play store were specifically talking about their data being held hostage.


> It seems the apps starting to enforce upgrades to pro in ~February of this year.

Only when installed from Google playstore.


Yeah but the price is symbolic... and maintaining app on Android, even without adding any features, requires some work to catch up on the changes made to the system


Unfortunately, the charge to my credit card would not be merely symbolic.


AppBrain shows market share stats for all apps and also for the top ranked apps: https://www.appbrain.com/stats/libraries/details/flutter/flu...

6% of the top ranked apps use Flutter.


Thanks. This proves my point. Flutter as of today is not a viable target. Once the usage cruises say 25%, it might be worth visiting.


From that same site: https://www.appbrain.com/stats/libraries/tag/app-framework/a...

React Native: 5.43% of apps (4.18% of installs) Flutter: 4.22% of apps (1.39% of installs)

It's clear from the ratio of apps to installs that React Native is used by apps that are on average 3x more popular, but that isn't really a sign that the framework is less viable, just that more of the most popular apps are were written using something else - and I'd speculate that in many cases those apps predated Flutter.

I actually find it more interesting that the number of apps written with Flutter compared to React Native is fairly similar. To me, that suggests that Flutter is gaining ground rapidly, because that very much wasn't the case when I first starting using Flutter on my hobby project a few years back.

In any case, your 25% target seems unrealistic for any framework [1]. Unless your takeaway is also that React Native is not a viable target until it too hits 25%.

[1] I'm discounting Kotlin from these stats as it's not a framework [2], and similarly I don't understand why they counted the Android components as a framework.

[2] Actually, I'm surprised Kotlin is this way down in the charts... If native code is now more popular than Kotlin, that could cause compatibility issues now some phone manufacturers are starting to experiment with RISC-V instead of Arm.


Nice approach. It feels very similar to "tracer bullet" development (I think coined by the pragmatic programmer book), where you get something end-to-end as quickly as possible and then start iterating on the parts. (https://www.swaroopch.com/tracer-bullet-development/ explains it too)


I used ChatGPT (because no access to bard :) ) to convert their 180 country names to 3-letter codes and generate a world map showing where it's available:

https://twitter.com/thijser/status/1656943947556569090

Definitely not what I had expected with "180 countries"


Apparently it's 180 Countries and regions. So every little island is included, even if it's not a separate country. Someone at Google finally learned marketing I guess.


"Regions" is used so that it captures disputed territories such as Taiwan (TWN) and semi-autonomous regions which have country codes but aren't countries e.g Hong Kong (HKG) and Macau (MAC). You will notice that most airlines and international companies now refer to country dropdowns as region dropdowns, primarily to satisfy the Chinese government.


   >> Someone at Google finally learned marketing I guess.
not sure if it's good marketing though. the first impression of disappointment and annoyance (and lack of veracity) might stick.


That’s their brand. (Sorry, I couldn’t resist.)


Always assumed Canada had a kind of European flavor to it. Thanks Google for confirming it...


They for sure do, they spell a lot of words different as they use the French spelling. Like neighbour and favour.


I've never considered it the french spelling. It's the british english spelling: https://trends.google.com/trends/explore?date=today%205-y&ge...

US tends to drop the u's in a lot of words. It doesn't make the original word french.

That said, a lot of english words do come from french. In fact, the english word favour came from the old french favor, apparently?

c. 1300, "attractiveness, beauty, charm" (archaic), from Old French favor "a favor; approval, praise; applause; partiality" (13c., Modern French faveur), from Latin favorem (nominative favor) "good will, inclination, partiality, support," coined by Cicero from stem of favere "to show kindness to," from PIE *ghow-e- "to honor, revere, worship" (cognate: Old Norse ga "to heed").


I literally can't tell how much you're joking. There's nothing French about the spelling of these contemporary English words, even though they have norman roots.


I didn't know english and french were only a single letter apart in some cases. Is there a common root of some kind?


> Is there a common root of some kind?

Yes, the common root is french, as in "Normans" :)

https://en.wikipedia.org/wiki/Normans


One of the neat linguistic things still in English from the Norman conquest - the word for the meat in English (which is traced back to German) is often the word for the animal in French.

Meat from cattle is beef. Steer in French is beof ( https://en.wiktionary.org/wiki/beef : From Middle English beef, bef, beof, borrowed from Anglo-Norman beof, Old French buef, boef (“ox”) )

Meat from a chicken is poultry. Chicken is poulet in French. ( https://en.wiktionary.org/wiki/poultry : From Middle English pultrie, from Old French pouleterie, from poulet, diminutive of poule (“hen”), from Latin pullus (“chick”). )

Meat from a swine is pork. The word for swine in French is porc. ( https://en.wiktionary.org/wiki/pork From Middle English pork, porc, via Anglo-Norman, from Old French porc (“swine, hog, pig; pork”), from Latin porcus (“domestic hog, pig”).)

This is because when the normans (who were the rulers at the time) wanted poulet on the table, they didn't want a live chicken - they wanted a cooked chicken and so the word the meat and the animal diverged in English.

There are also some interesting Spanish / Arabic word pairs from https://en.wikipedia.org/wiki/Muslim_conquest_of_Spain where the word in Spanish differs from the romance side of the family tree.


Meat from a chicken is chicken. The class of edible animals to which chickens belong, and the general term for their meat, is poultry.


Very basic words in english (eg water, man, milk, drink) tend to have Germanic (Anglo-Saxon) roots whereas newer more abstract words tend to have come in after the 1066 Norman Conquest, with French words eclipsing their Anglo-Saxon equivalents, as the Norman aristocracy supplanted the Anglo-Saxon rulers.



Well they do speak French


And English!


But (almost) no German. The most spoken (native) language in Europe.


According to Wikipedia (and unsurprisingly), that's technically Russian. Western Europe, perhaps?

https://en.m.wikipedia.org/wiki/List_of_languages_by_number_...


Thanks for correction. I was under the assumption Russian was second.


wow, I did not know that!

other fun fact I learned from wikipedia - "German-Americans make up the largest self-reported ancestry group within the United States accounting for roughly 49 million people and approximately 17% of the population of the US"


Weird that its not available in Brazil. Thats a big market, no?


Google has been at odds with Brazil's judicial system, likely doesn't want to add more controversial fuel to the fire.


Brazil implements a local copy of GDPR, "LGPD".


So does Turkey with KVKK, but Bard is available in Turkey. There must be something else.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: