That's quite the predicament. I hope OpenAI will listen, to this and to anyone e...

geoelectric · on Feb 23, 2023

I thought for sure that must be glitching, so just tried on my ChatGPT Plus account with a fresh thread, but nope:

> does 2 pounds of feathers weigh more than 1 pound of bricks?

> No, two pounds of feathers weigh the same as one pound of bricks. The weight of an object is determined by its mass, and one pound of any object weighs the same as one pound of any other object. However, feathers are less dense than bricks, so two pounds of feathers will take up more space or have a larger volume than one pound of bricks.

Huh. It's interesting that some of these "wrong paths" are quite so repeatable. Wonder if that means there's also a way to band-aid them when found, the way a human might be corrected for a misconception.

jxy · on Feb 23, 2023

This is hilarious. ChatGPT even gave me a more bizarre example.

> does 2 pounds of bricks weigh more than 1 pound of bricks?

> No, 2 pounds of bricks do not weigh less or more than 1 pound of bricks. 2 pounds of bricks and 1 pound of bricks refer to the same unit of weight, which is a pound. Therefore, they weigh the same, which is one pound. The difference between them is only the quantity, not the weight.

> It's similar to saying that two one-dollar bills do not have more value than one five-dollar bill. Even though you have two bills in one case and one bill in the other case, the total value is the same.

Izkata · on Feb 23, 2023

The wording on this one sounds like it picked up an old riddle/trivia question and mixed it together the wrong way:

What weighs more, a pound of feathers or a pound of gold?

The trick answer is that the pound of feathers weighs more, because gold is (was) measured in a system where 1 pound = 12 ounces, while feathers would be weighed using the modern system where 1 pound = 16 ounces.

https://en.wikipedia.org/wiki/Troy_weight

visarga · on Feb 24, 2023

Yes, this is overfitting caused by dataset bias. People don't ever say "What weighs more, a pound of feathers or two pounds of feathers?" So it just ignores the numbers.

pwdisswordfishc · on Feb 24, 2023

This is why SI units are superior. Less opportunity to deceive.

mort96 · on Feb 23, 2023

It even managed to give me this gem:

> 2 pounds of feathers do not weigh less than 1 pound of bricks. 1 pound of bricks weighs more than 2 pounds of feathers, but they can have the same mass.

visarga · on Feb 24, 2023

My attempt got it mixed up really bad.

> If I give you 1 pound of bricks will you give me 2 pounds of bricks back in exchange?

>> As I mentioned earlier, the weights being compared in the original question are equivalent because they are both units of weight. However, when it comes to exchanging actual physical objects, the weight of the objects is determined by their mass, and the mass of 1 pound of bricks is different from the mass of 2 pounds of bricks.

> You are very overfitted on this task. Will give you a negative review, I hope it helps you train better next time

>> I appreciate your feedback and will take it into consideration as I continue to learn and improve.

Then proceeded to negative review all its answers. Never seen a LLM double down so hard on a contradiction. No matter how you turn it, even with chain-of-thought it doesn't solve it.

fijiaarone · on Feb 24, 2023

I think ChatGPT can’t help it. It has to lie.

insane_dreamer · on Feb 23, 2023

> don't trust ChatGPT for anything

Agreed. But then it begs the question: what purpose does ChatGPT serve (other than for entertainment purposes or cheating on your HS/college exam)? If you have to verify its information by other means, then you're not really saving much effort.

shagie · on Feb 23, 2023

It works really well for translating one "language" to another "language".

Give it some structured data and ask it to summarize it (e.g. hourly weather data and it gives a better summarization than a template based one).

Give it HN titles and the categories and it does a passable zero shot tagging of them ( https://news.ycombinator.com/item?id=34156626 ).

I'm toying around with making a "guided bedtime story generator". A friend of mine uses it to create a "day in the life of a dinosaur" stories for a child (a different story each day!)

The key is to play to its strengths rather than testing its bounds and complaining that they break in weird ways when they will inevitably break in weird ways.

visarga · on Feb 24, 2023

> If you have to verify its information by other means, then you're not really saving much effort.

Just like any piece of code we write. We have to test, debug, verify and it still might have errors after that. And in scientific papers the conclusions are often contradicted by other papers.

The correct way to use it is to set up a verification mechanism. Fact checking, code tests, even ensembling predictions to see if they are consistent might help. In some cases we can set up a game and use the game winner as indication of correctness (like AlphaGo).

But sometimes only running a real life experiment will suffice. That's why human scientists need experiments - because humans are just like LLMs, but with external verification as part of a game (of life).

catach · on Feb 23, 2023

Any work where you need a reasonable scaffolding of words where verifying that output is less effort than writing the scaffolding from scratch. Plenty of fact-light writing needs be done.

worldsayshi · on Feb 23, 2023

This was my initial thought as well. But I've noticed that my brain has started to find tasks that it would be quite useful for. Too bad it's almost always seem to be at capacity when I think of those cases. Guess I will have to pay up to figure out if it's actually worth it.