Hacker News new | past | comments | ask | show | jobs | submit login

That's quite the predicament. I hope OpenAI will listen, to this and to anyone else in a similar situation. I'm reminded of the cases of ChatGPT recommends random people's personal phone numbers for various services.

But yeah, don't trust ChatGPT for anything. Just earlier today I tried my darnedest to convince it that 2 pounds of feathers doesn't weigh the same as 1 pound of bricks, and it just would not listen, presumably because it just regurgitated stuff related to the common "1 pound of feathers and 1 pound of bricks" question.

By the way, the last paragraph has some typos:

> I wrote this post to have a place to send our new ChatGPT users when they ask why it isn’t work, but hopefully also it serves as a warning to othrs - you absolutely can not trust the output of ChatGPT to be truthful,




I thought for sure that must be glitching, so just tried on my ChatGPT Plus account with a fresh thread, but nope:

> does 2 pounds of feathers weigh more than 1 pound of bricks?

> No, two pounds of feathers weigh the same as one pound of bricks. The weight of an object is determined by its mass, and one pound of any object weighs the same as one pound of any other object. However, feathers are less dense than bricks, so two pounds of feathers will take up more space or have a larger volume than one pound of bricks.

Huh. It's interesting that some of these "wrong paths" are quite so repeatable. Wonder if that means there's also a way to band-aid them when found, the way a human might be corrected for a misconception.


This is hilarious. ChatGPT even gave me a more bizarre example.

> does 2 pounds of bricks weigh more than 1 pound of bricks?

> No, 2 pounds of bricks do not weigh less or more than 1 pound of bricks. 2 pounds of bricks and 1 pound of bricks refer to the same unit of weight, which is a pound. Therefore, they weigh the same, which is one pound. The difference between them is only the quantity, not the weight.

> It's similar to saying that two one-dollar bills do not have more value than one five-dollar bill. Even though you have two bills in one case and one bill in the other case, the total value is the same.


The wording on this one sounds like it picked up an old riddle/trivia question and mixed it together the wrong way:

What weighs more, a pound of feathers or a pound of gold?

The trick answer is that the pound of feathers weighs more, because gold is (was) measured in a system where 1 pound = 12 ounces, while feathers would be weighed using the modern system where 1 pound = 16 ounces.

https://en.wikipedia.org/wiki/Troy_weight


Yes, this is overfitting caused by dataset bias. People don't ever say "What weighs more, a pound of feathers or two pounds of feathers?" So it just ignores the numbers.


This is why SI units are superior. Less opportunity to deceive.


It even managed to give me this gem:

> 2 pounds of feathers do not weigh less than 1 pound of bricks. 1 pound of bricks weighs more than 2 pounds of feathers, but they can have the same mass.


My attempt got it mixed up really bad.

> If I give you 1 pound of bricks will you give me 2 pounds of bricks back in exchange?

>> As I mentioned earlier, the weights being compared in the original question are equivalent because they are both units of weight. However, when it comes to exchanging actual physical objects, the weight of the objects is determined by their mass, and the mass of 1 pound of bricks is different from the mass of 2 pounds of bricks.

> You are very overfitted on this task. Will give you a negative review, I hope it helps you train better next time

>> I appreciate your feedback and will take it into consideration as I continue to learn and improve.

Then proceeded to negative review all its answers. Never seen a LLM double down so hard on a contradiction. No matter how you turn it, even with chain-of-thought it doesn't solve it.


I think ChatGPT can’t help it. It has to lie.


> don't trust ChatGPT for anything

Agreed. But then it begs the question: what purpose does ChatGPT serve (other than for entertainment purposes or cheating on your HS/college exam)? If you have to verify its information by other means, then you're not really saving much effort.


It works really well for translating one "language" to another "language".

Give it some structured data and ask it to summarize it (e.g. hourly weather data and it gives a better summarization than a template based one).

Give it HN titles and the categories and it does a passable zero shot tagging of them ( https://news.ycombinator.com/item?id=34156626 ).

I'm toying around with making a "guided bedtime story generator". A friend of mine uses it to create a "day in the life of a dinosaur" stories for a child (a different story each day!)

The key is to play to its strengths rather than testing its bounds and complaining that they break in weird ways when they will inevitably break in weird ways.


> If you have to verify its information by other means, then you're not really saving much effort.

Just like any piece of code we write. We have to test, debug, verify and it still might have errors after that. And in scientific papers the conclusions are often contradicted by other papers.

The correct way to use it is to set up a verification mechanism. Fact checking, code tests, even ensembling predictions to see if they are consistent might help. In some cases we can set up a game and use the game winner as indication of correctness (like AlphaGo).

But sometimes only running a real life experiment will suffice. That's why human scientists need experiments - because humans are just like LLMs, but with external verification as part of a game (of life).


Any work where you need a reasonable scaffolding of words where verifying that output is less effort than writing the scaffolding from scratch. Plenty of fact-light writing needs be done.


This was my initial thought as well. But I've noticed that my brain has started to find tasks that it would be quite useful for. Too bad it's almost always seem to be at capacity when I think of those cases. Guess I will have to pay up to figure out if it's actually worth it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: