Improbably Right

Robotbeat · on April 24, 2023

So I know people are probably tired of posts that just plug something into ChatGPT, but I think it might be interesting to see how GPT3.5/ChatGPT might handle it.

It got the answer to the hen problem wrong every time. I kept modifying the prompt to let it know it was a tricky problem, to think step by step, to self-critique and adjust, & tried at temperatures of 0, 0.7, and 1 multiple times and after a dozen tries it kept giving either 12, 36 (most common), or 144. (Starting a new session every time.)

> This is a tricky arithmetic word problem. Think step by step. After having an initial answer, critique it, double check it, and change it to the correct answer if necessary:

3 hens lay 3 eggs in 3 days. How many eggs do 12 hens give in 12 days?

Again, show your work and don’t just give the answer right away but reason through the problem.

That is, UNTIL I gave it the specific advice that the article gave, adding this to the beginning of the above prompt (with another fresh session):

>I know that humans are biased for intuitive understanding rather than logical one. Intuitions are shortcuts that are very valuable, but sometimes they are just plainly wrong.

…and then it got the right answer immediately, first try. So this trick isn’t just a neat blog post, but empirically useful, and I’ll probably include it in prompts from now on for these kinds of problems…

(And note this is a blog post from today, so the blog post was not in the training set.)

kqr · on April 24, 2023

> So this trick isn’t just a neat blog post, but empirically useful, and I’ll probably include it in prompts from now on for these kinds of problems…

Alernative take: this is a very classic problem used to illustrate this kind of System 1 inaccuracy, so by deliberately including words that increase the conditional probability of us looking at a text that includes discussion on System 1 inaccuracies, you have essentially narrowed down the search space for our dear stochastic parrot to those texts that do contain the right answer.

This sounds dismissive, but I don't mean it that way. I'm just saying that "think about human biases" may not be a universally useful cue, it may only work specifically when trying to get the model to look like a discussion on human biases, and the typical sample problems used to illustrate that.

codeflo · on April 24, 2023

Very much yes to your first sentence, but I don’t think “just plug something into ChatGPT” is a fair summary of your (very interesting) post, so you’re fine with me.

I’ve seen several statements recently, including one in a talk by a Microsoft researcher working with OpenAI, that claim that all the RLHF safety training introduces human biases and makes the models worse at abstract reasoning. I really wonder why a relatively small amount of human feedback has that effect, while training on this gigantic human-written corpus doesn’t seem to introduce the same biases. That hints at something interesting going on.

panarky · on April 24, 2023

To human ears, the problem of 3 hens, 3 eggs, 3 days is a simplified and more obvious problem than the original:

"If a hen and a half can lay an egg and a half in a day and a half ..."

But it's the same problem in the same form, and GPT4 nails it without extra goofy prompting like "don't replicate human cognitive biases".

A more interesting question would cross logic with problem solving.

"If a hen and a half can lay an egg and a half in a day and a half, how many hens are needed to fill Yankee Stadium with eggs in a year?"

But GPT4 doesn't have very good intuition on how to estimate the volume of Yankee Stadium:

Yankee Stadium has a seating capacity of approximately 50,000 spectators.

Let's assume each spectator occupies a space of 2 cubic meters (which is a generous estimate, including seating and legroom).

Total volume (approx.) = 50,000 seats × 2 m³/seat = 100,000 m³

mlajtos · on April 24, 2023

We are gonna need to prefix all prompts with "From now on, act without human biases."

Kranar · on April 24, 2023

With GPT 4 it nailed it on the first try:

>3 hens lay 3 eggs in 3 days. How many eggs do 12 hens give in 12 days?

>Let's break down the given information and solve this problem step by step.

We know that 3 hens lay 3 eggs in 3 days.

First, we'll find out how many eggs 1 hen lays in 3 days. 3 hens lay 3 eggs, so 1 hen lays 1 egg in 3 days.

Now, let's find out how many eggs 1 hen lays in 12 days. In 3 days, 1 hen lays 1 egg, so in 12 days, 1 hen will lay (12 / 3) * 1 = 4 eggs.

Finally, we'll find out how many eggs 12 hens lay in 12 days. If 1 hen lays 4 eggs in 12 days, then 12 hens will lay 12 * 4 = 48 eggs in 12 days.

mlajtos · on April 24, 2023

I love your comment.

I tried GPT-4 via ChatGPT and it nailed it without any prompt engineering.

---

Prompt: 3 hens lay 3 eggs in 3 days. How many eggs do 12 hens give in 12 days?

GPT-4:

Let's first determine how many eggs one hen can lay in 12 days.

We know that 3 hens can lay 3 eggs in 3 days. This means that each hen lays 1 egg in 3 days (3 hens / 3 eggs = 1 hen / 1 egg).

To find out how many eggs one hen can lay in 12 days, we need to multiply the number of eggs laid in 3 days by 4 (since 12 days is 4 times longer than 3 days). So, one hen will lay 1 egg * 4 = 4 eggs in 12 days.

Now, we have 12 hens. To find out how many eggs they can lay in 12 days, we just multiply the number of eggs one hen can lay in 12 days by the number of hens:

12 hens * 4 eggs per hen = 48 eggs in 12 days.

---

To be completely fair, this math problem can be googled, so it is probably in the training set.

xyzzy123 · on April 25, 2023

Interestingly if you give GPT-4 the problem but append the instruction "answer only, no explanation" it gets it wrong (I got "108" as the answer in that case).

It is fascinating how much it needs to work step by step.

gorpy7 · on April 24, 2023

somehow this answer reminded me of the good advice for starting to tighten a nut. first turn left till you feel the nut drop- then turn right. I feel there is a universal lesson here. Maybe going, as the crow flies, directly toward your goal isn’t always best.

still_grokking · on April 24, 2023

I didn't know this "puzzle" until just now.

Stupid question: What is the "tricky" part about it? How can someone who isn't completely retarded get this even wrong?

This puzzle is trivial, and especially doesn't require any kind of "thinking around the corner". So what is this about?

That language models can't do math (or logic) and can't even reliably tell which of two given written out numbers is the bigger one is imho a different story.

Language models are great for text. But when you need a tool for math and/or logic you should use an adequate tool. We have for example algebra systems for that. Or prove assistants. Or just good old Prolog. It makes no sense to use the wrong tool and than wonder that the results are terribly wrong.

But for the above "puzzle" you don't even need a calculator. So I really don't get the point.

Etherlord87 · on April 24, 2023

The trickiness is that it was a part of a set of easy questions. You have 3 same numbers x x x, then change two of them, and ask what's the third y y ? It seems obvious it's y again.

Of course if you have some experience with math (apparently more experience than kids in school have), it's immediately apparent it's a multiplication and the product doesn't scale linearly… But I can imagine myself still answering this incorrectly just because I have no incentive to be careful and I'm in a state of mind to just answer the question quickly. Though while reading the article I solved it, in memory.

still_grokking · on April 24, 2023

> The trickiness is that it was a part of a set of easy questions. You have 3 same numbers x x x, then change two of them, and ask what's the third y y ? It seems obvious it's y again.

OK… That's all?

I start to see the problem.

But I'm not going to write what my conclusion is as this would get likely down-voted into oblivion. (I was actually expecting down-votes already just for asking but still wanted to understand what the issues here is as I honestly don't see any "tricky part" in the "puzzle" at all.)

alanfalcon · on April 24, 2023

It is tricky because humans are biased for intuitive understanding rather than logical understanding. Intuitions are shortcuts that are very valuable, but sometimes they are just plainly wrong. Many people will try to apply intuition rather than logic because that works with many other similarly worded questions.

still_grokking · on April 24, 2023

Still don't get it.

The imho intuitive answer is the right one.

I really tried to get what could be tricky about this question. But there is just nothing.

I have no clue how anybody could arrive at anything else[†] than 48. It's trivial. There is no "puzzle" at all.

† (As long as you don't consider RL biology, as than then there is no definite answer, only some probability.)

jhbadger · on April 25, 2023

I think it is an issue of language "3 hens lay 3 eggs in 3 days" can be interpreted (as intended that each hen lays an egg only once every three days) or as implying that it is implied that they mean 3 eggs for each hen over the three days, meaning 1 egg per hen per day (incidentally closer to what actual commercial hens lay, not that this matters).

still_grokking · on April 25, 2023

OK, maybe this is something that seems confusing for native English speakers (even I think it should not).

But your interpretation would be correctly worded as "3 hens lay each 3 eggs in 3 days".

"3 hens lay 3 eggs in 3 days" means imho completely unambiguously that "one hen lays one egg every three days".

jhbadger · on April 26, 2023

In practice "each" is optional if it seems implied from context. I mean, if you say "Our employees work 40 hours/week" it would really be weird to interpret it as anything other than each employee working 40 hours a week rather than all employees teaming up to work 40 hours in total. Same thing with hens teaming up to lay 3 eggs. If you mean that hens lay an egg once every three days then say that and the whole thing is far clearer.

still_grokking · on April 26, 2023

Thanks!

This finally makes sense.

Your example nails it. Now I get where the confusion comes form.

I'm not a native speaker and I guess it's not intuitive for me to insert an implicit "each" into the "puzzle" sentence. For me it sounded completely unambiguous. But if you subtextual "hear" an "each" in this sentence the wrong calculations start to make sense.

I've just tried to translate all the variants of the sentences into German (my primary langue) and indeed you would need a "jeweils" in both cases to be clear that you really mean "each". Your example with the employees would sound quite "unclean" without the "jeweils". One would understand it, but it sounds a bit wrong. But in English "Our employees work 40 hours/week" sounds just fine (at least for me).

I guess when I do calculations and "puzzles" in my head I do it in German… :-)

mlajtos · on April 26, 2023

Sapir-Whorf to the rescue! :D

Robotbeat · on April 25, 2023

For a 10 year old.

still_grokking · on April 25, 2023

I would expect every not retarded 10 year old to give the correct answer.

We're talking here about third or forth graders. Those kids should be able to do basic arithmetic with small numbers.

glofish · on April 24, 2023

I believe that the author misremembers the common solution for the first question

> 3 hens lay 3 eggs in 3 days. How many eggs do 12 hens give in 12 days?

> [...] When we got to it, everybody shouted "Three!"

I don't see why the "common" wrong answer would be 3. Why would anybody think that? The problem looks like this 3, 3 -> 3

When we see the 12 and 12 the intuitive, common, wrong answer should be 12 eggs.

That's what makes sense IMHO.

photochemsyn · on April 24, 2023

How to annoy the teacher:

> "Ovulation (release of the yolk from the ovary) occurs every 24 – 26 hours regardless of fertilization (so a rooster is not needed). A hen ovulates a new yolk after the previous egg was laid. It takes 26 hours for an egg to fully form (white and shell added), so a hen will lay an egg later and later each day. Eventually the hen will lay too late in a day for ovulation to be signaled. She will then skip a day or more before laying another egg." (UWisconsin Livestock)

So, the 3 hens must have been at the least productive point in their egg-laying cycle over the initial three-day time period... Now if we have the two-hour daily offset, over 12 days, ummm, maybe two or three days skipped per hen? So, ah, 108-120 eggs is what the farmer could expect from the 12 hens?

dvh · on April 24, 2023

I remember a story where school teacher was showing plastic animals and asking kids what it was. At some point all kids wrote "cow" except for one farm boy who wrote "goat". Turns out the plastic cow had wrong number of udders. Cows have four, goats have two.

mlajtos · on April 24, 2023

Once I saw 5-armed snowflake as an icon on weather forecast in TV.

dusted · on April 25, 2023

You meant teats, they both have one udder, but otherwise right :)

mlajtos · on April 24, 2023

This is gold!

moondrek · on April 24, 2023

I read it as the students being familiar with a similar "trick" question and erroneously pattern matching on that.

Another similar riddle goes like:

"If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?"

The correct answer there being "5 minutes", and the intuitive wrong answer being "100" as per your post.

That would make sense as to why the teacher also expected "3", since they should be familiar with the existence of their non-intuitive questions, even if they misremember the specific non-intuition.

Cthulhu_ · on April 24, 2023

Or as to why adding more people to a late project makes it later: 1 woman takes 9 months to grow a baby, how long does it take 9 women to grow a baby?

mlajtos · on April 24, 2023

Of course it was "Twelve!". Thank you for bringing this up. It is fixed now.

thechao · on April 24, 2023

I took me surprisingly long to convince myself the author was correct: the answer is 48 because there's 4x as many hens and 4x as many days: 3 x 4 x 4 = 48. The "wrong" way to solve this problem is to compute hen-laying in terms of hen-egg-days, and then scaling.

jaclaz · on April 24, 2023

>The "wrong" way to solve this problem is to compute hen-laying in terms of hen-egg-days, and then scaling.

I am not sure to understand what you mean by "wrong" way, I got the right result through a different reasoning that seems to me based on hen-egg-days.

It takes 3 days for 3 hens to make 3 eggs.

The 3 days time are "fixed", i.e. it takes 3 days for each hen to make its own egg.

The egg production rate is 1/3 egg per day per hen.

So I have 12 hens x 12 days x 1/3 = 144 x 1/3 = 48

c22 · on April 24, 2023

I dunno, I immediately noted that 12 hens must lay 12 eggs in 3 days (it takes 3 days for a hen to lay an egg). Then I just asked myself how many 3-days are in 12-days--okay, 12 [eggs] * 4 [3-day egg periods] = 48...

I didn't need to resort to pen and paper, but I'm also not in primary school.

jimmaswell · on April 24, 2023

I always skip straight to dimensional analysis for such a thing.

3 eggs / 3 chickens / 3 days * 12 chickens * 12 days cancels out to 48 eggs

jkubicek · on April 24, 2023

There are a lot of right ways to solve this problem, but this is clearly the most fun way.

hgsgm · on April 24, 2023

That's because it's output/input = eggs/(hen*day), not eggs/(hen/day), which is clear from meaning and logic if you aren't a GPT 2 LLM

mlajtos · on April 24, 2023

There is a visual illustration in the article. I thought it might be useful for understating why it's 48.

ouid · on April 24, 2023

He misremebered the third task as well, since his solution to it, as stated, is wrong. If you aren't given that every card has a number on one side and a color on the other, then you need to check everything which does not satisfy the consequent.

mlajtos · on April 24, 2023

I added "We have cards with a number and color on each face." to the task statement. Hope this makes rules clearer as I could not phrase it better.

ouid · on April 24, 2023

The rules were clear before, what has changed is that they are now different rules. You were just wrong before.

You've still got a problem though, now when you ask the reader to verify the rule, you have to explicitly state that they're not trying to verify the first rule.

mlajtos · on April 24, 2023

Oh shit, you are right. Added a completely explicit statement under the problem question. In both situations.

Thank you for pointing this out.

menzoic · on April 24, 2023

the article says everyone shouted 12 not Three

forbiddenlake · on April 24, 2023

It said 3 originally.

https://news.ycombinator.com/item?id=35688034

yaakov34 · on April 24, 2023

Can't pass by without making a public service announcement about the average speed example: don't compute integrals using the calculus definition of Sum(f(x_i) * delta). Look up quadrature or numerical integration methods instead.

Although it is true that you can approximate the average speed by taking an average of instantaneous speed measurements, that's usually a very bad way to do it in any real world situation. Numerical difference values are always noisier than the underlying quantity, sometimes to the point of being unusable, so of course if you can just read off the quantity you want directly (total difference over time), you should do that. But even if you can't, you should use a proper integration method instead of the calculus definition.

I have seen the Sum(f(x_i) * delta) calculation in a lot of real-world code. It has bad convergence properties, bad errors when the function has large derivatives, and bad performance when the data has noise. Some of the code I've seen produces garbage results, or has thousands of function evaluations when you need, like, four. "Quadrature? I think I heard that before, but I don't remember what it means."

In summary, please don't compute derivatives as (f(x_i+1)-f(x_i))/delta, or compute integrals as Sum(f(x_i) * delta), and especially, please don't do the first immediately followed by the second. Which also happens. Look up numerical methods instead.

This has been a public service announcement.

roelschroeven · on April 24, 2023

Isn't it the case that other numerical integration methods only work if you have a f(x) that you can evaluate for any x you want (albeit possible costly)?

It seems to me that in many practical applications, the only thing you have to work with it is samples at discrete moments in time. It certainly seems to be the case here: "I would measure car's speed at every instant and produce an average of those measurements." We only know f(t_0), f(t_1), f(t_2), ... (and if we're lucky t_1-t_0 = t_2-t_1 = t_3-t_2 and so on); we have no way to compute things like f((t_0 + t_1)/2). In that case, how can we improve our calculation?

yaakov34 · on April 24, 2023

Even if you're limited to uniform sampling, something as simple as the trapezoid rule will give you quadratic convergence instead of linear for the naive Sum(f(t_i) * delta). In other words, error proportional to 1/n^2, instead of 1/n, where n is the number of samples, which is going to be a huge difference. There are many methods depending on the constraints of your problem - your ability to choose sampling intervals, knowledge of the bounds of your function or its derivatives, etc. The PSA is to study these things, instead of just writing the first thing that seems familiar from a long-ago calculus class.

hgsgm · on April 24, 2023

Huh? Quadrature is a general term for "measuring area". In this context it's a synonym for integration.

I think you are trying to say that it's better to do weighted sums of fewer samples, instead of uniformly weighted Reimann sum. Both are "calculus definition" integration, of course, since calculus is true.

ouid · on April 24, 2023

When mathematicians say Quadrature, they mean that if your function is suitably approximated by projecting onto some orthogonal basis functions, you can get very cheap approximations by cleverly expressing those integrals exactly as a linear combination of their values at certain points along the interval. You need very few.

https://en.m.wikipedia.org/wiki/Gaussian_quadrature

It is significantly more subtle than what you are thinking.

hgsgm · on April 24, 2023

From that page:

> In numerical analysis, a quadrature rule is an approximation of the definite integral of a function, usually stated as a weighted sum of function values at specified points within the domain of integration

Simpson's rule, taught in first year calculus, is exact for cubics, with 3 sample points.

kqr · on April 24, 2023

> please don't compute derivatives as (f(x_i+1)-f(x_i))/delta

Isn't this exactly what finite differences means? Sure, it's not optimal in all respects, but it's incredibly general and easy to remember.

yaakov34 · on April 24, 2023

This calculation amplifies any noise present in the values of the function, often to the point of the output being unusable. There are many methods that can be used to approximate derivatives, depending on the problem. Just as we shouldn't try to invent our cryptographic methods from scratch, we should take advantage of the extensive knowledge already in use for numeric methods.

I've seen naive numeric methods cause everything from jerky motion in video games to incorrect navigation data for cars.

kqr · on April 24, 2023

So what would you suggest as a general calculation for finite differences, especially in those cases when only forward differentiation is possible, e.g. with respect to time?

yaakov34 · on April 24, 2023

The closest thing to a universal approach would be a Kalman filter. It's usually where you start when you have noisy measurements coming in, and you need to maintain state such as value and derivative.

Since the original question was about computing the velocity of a car, and since I work in the automotive field, let's take a real example: you want to know the approximate position, acceleration, and velocity (linear and angular) of your car. Your inputs are driven wheel speed (noisy, affected by wheelspin), non-driven wheel speed (noisy), accelerometer output (inaccurate, only present for some axes), GPS position (updated occasionally, has errors), and steering angle (pretty accurate, can be put into a chassis dynamics model). Almost certainly, you would use a Kalman filter to estimate the state of the car. Naive approaches such as subtracting two wheel speed values to obtain acceleration will not work well.

My point is that we should remember that numerical algorithms are a developed field with a lot of knowledge, and we should take advantage of the proven approaches. Sometimes, programmers who are not specifically from the physics or numerical fields, and who need to perform some computation, reach for a very simple approach such as the rectangle-rule integrals, and get bad results.

kqr · on April 24, 2023

I see -- we are talking about two different things!

You are working on the problem of figuring out the hidden state based on noisy observations and a transition model.

I interpreted your statement much more broadly, so I was trying to discuss the problem of computing the explicit next state based on perfect knowledge of the transition model, in which case (y_1-y)/dt is a perfectly viable approach to estimate the derivative.

(You can do better by adding higher-order terms of course, but I haven't found that to be universally useful compared to making dt smaller.)

eviks · on April 24, 2023

> task is presented in context of social relations (or more broadly "evolutionarily familiar problems"), people tend to get it immediatelly, or rather intuitivelly. Bonkers, right?

That's obvious, not bonkers - if you have experience in some class of problems, it's easier to solve the same kind of problem, the patterns match in your head, so the solution pattern is used right away. Just like you can solve these numeric problems quickly if you're trained on them

moffkalast · on April 24, 2023

So in a sense we have "hardware acceleration" for solving certain types of problems?

mlajtos · on April 24, 2023

Yes. https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow

eviks · on April 24, 2023

Wetware acceleration?

mlajtos · on April 24, 2023

But the task is exactly the same in both situations – only the symbols on the cards have less/more familiar meaning. I see this as being not able to generalize from known examples.

eviks · on April 24, 2023

It's not exactly the same exactly because the difference in symbols maps to different patterns - one is a pattern from school, which you forgot since it hasn't been used for many years (or never learned properly). Another is a pattern that keeps getting refreshed throughout your life. It's a very different cognitive workload, leading to difference off required effort/correctness, especially in these simplistic low value tests

And why is your assessment of the ability to generalize limited to only this math representation which people don't use?

mlajtos · on April 24, 2023

> It's not exactly the same exactly because the difference in symbols maps to different patterns

Does this mean that these should be treated differently?

a) "3 * 2 = 6"

b) "III + III = VI"

c) "tres et tres aequalis sex"

d) "ln(2) + ln(3) = ln(6)"

While symbols (and even patterns) are not the same, you could see that all these examples point to the same thing.

> And why is your assessment of the ability to generalize limited to only this math representation which people don't use?

I think people fail to generalize even in even simpler situations. I can't think of anything obvious that would apply to adults, but this is a nice example for children: https://en.wikipedia.org/wiki/Conservation_(psychology) & video demonstration: https://www.youtube.com/watch?v=gnArvcWaH6I

eviks · on April 24, 2023

You could see that mostly because the first example does everything as that's what everyone knows, the second is similar and solidifies the pattern, which you could then match to the rest

The question is, what does it tell you when people struggle with Roman LCX*ML more vs the more familiar first pattern besides the fact that training/familiarity makes it easier?

And it's telling you have to reach for the kids

mlajtos · on April 24, 2023

Oh, I see your point now. Multiplication in Roman numerals is strikingly different (and much harder) than addition. So representation of the pattern might be so alien that it's unrecognizable even though the pattern is known.

mannykannot · on April 24, 2023

I question whether it is a matter of not able to generalize from known examples, as I suspect that most of the people who do well on the corresponding social questions did not know the answer beforehand, but nevertheless found their way to the correct one.

The article you link to seems to support this view: "Cosmides and Tooby argued that experimenters have ruled out alternative explanations, such as that people learn the rules of social exchange through practice and find it easier to apply these familiar rules than less-familiar rules."

I now wonder if people tend to do better on abstract Wason tasks after being prompted by logically-equivalent social-relations questions, as opposed to being prompted with additional abstract problems (without being told whether they gave the correct responses to the prompts.)

My guess at what might be underlying these outcomes is that people clearly see the purpose of the social rules but not of the abstract ones, and our intuition is attuned to assessing how various social scenarios will play out, taking into account the motives of the actors (this might seem to be the same as what Cosmides and Tooby claim has been ruled out, and maybe it is, but I think what they are ruling out is us having learned various schema to be applied mechanically in the corresponding social situations.)

Alternatively, could it be something as simple as this: having an understanding of the social rule's purpose helps the subjects avoid confusing conditional and biconditional rules? At least in the English language, acceptable usage can be quite ambiguous in this regard.

https://en.wikipedia.org/wiki/Wason_selection_task

Beldin · on April 24, 2023

Wait, they're not - at least, in the example of the article. Allow me to digress:

There is a figure with 4 cards, showing: all orange, all blue, the number 2, the number 1.

The only condition is that an even number must have an all blue flipside. This condition is either irrelevant or satisfied for the blue card. For all other cards, we don't know. The card with "1" may have the number "4" on the flip side. So we need to check ask other cards.

TL;DR: only a blue face guarantees compliance with the rule, all other cards need to be flipped.

In the second case, the unstated assumption that the cards all have one side worth digits and one side with symbols is still unstated, but a lot stronger. (These are things to check for age-alcohol compliance, so logically they must contain both age and alcoholic state of consumption.)

mlajtos · on April 24, 2023

> The only condition is that an even number must have an all blue flipside. This condition is either irrelevant or satisfied for the blue card. For all other cards, we don't know. The card with "1" may have the number "4" on the flip side. So we need to check ask other cards.

Very good point, thank you. I have added an explicit statement that cards are consistent.

ouid · on April 24, 2023

This argument is invalid. These two things are easy to pattern match, but every theorem comes with an obligation to show that your current state matches your predicate, which is very hard, generally.

gdprrrr · on April 24, 2023

Could it be explained in terms of abstract vs concrete? I've heard that we humans can reason about concrete things like beer etter than abstract concepts like numbers and colors? (I know that age is also a number, but is has more assigned meaning)

mlajtos · on April 24, 2023

I think it could as dealing with concrete things is a direct evolutionary advantage grounded in reality. Usually, we are shit dealing with abstract concepts. A quote that beautifully demonstrates this:

"To deal with hyper-planes in 14-dimensional space, visualize a 3-D space and say 'fourteen' to yourself very loudly. Everybody does it." — Geoffrey Hinton

atleastoptimal · on April 24, 2023

This entire post is "I am le smart, so smart that other people think I'm wrong"

I think it would come across better if the author mixed times where their non-traditional intuition gave them the wrong answer.

eyelidlessness · on April 24, 2023

I don’t think that’s a remotely fair characterization of the author’s self-evaluation. A fair bit of the article is spent expressing embarrassment, presuming they were in fact wrong, and surprise at discovering that presumption was their error.

What you’re describing would be an entirely different post, likely making a very different point. There’s value in exploring “sometimes your idiosyncratic intuitions turn out to be right”, without muddying that with the obvious intuition that this won’t always be the case. There’s value in exploring the idea through demonstrative anecdote without counter examples. If anything, counter examples might influence the kind of pattern matching biases which lead to such faulty intuitions.

Anyway, I think the author has demonstrated that they’re fallible and happy to be corrected. They’ve acknowledged more concrete problems with the post and made changes to address them at least a couple times.

mlajtos · on April 26, 2023

> fallible and happy to be corrected

Thank you, that is a very accurate description of how I operate. If you don't mind, I'll steal this phrase from you. :)

eyelidlessness · on April 27, 2023

Just saw this now but I absolutely don’t mind.

mlajtos · on April 27, 2023

Thank you :)

mlajtos · on April 26, 2023

> I think it would come across better if the author mixed times where their non-traditional intuition gave them the wrong answer.

Oh, there are a lot of instances where I was biased and my actions were dead wrong. (example story: https://mlajtos.mu/posts/i-told-my-crush-her-leaked-password ) This post was about rare moments when I was right. I'll try to write about my failures more often, as the source material is basically endless.

hgsgm · on April 24, 2023

Especially because the post has several mistakes that commenters here corrected.

Including a case where he failed to understand the Fundamental Theorem of Calculus, and proposed an infinitely more inefficient solution:

> We were just presented with integral calculus and there it was – you can get an average of some function with definitive integral. That is literaly measuring car's speed at every instant!

rcme · on April 24, 2023

I'm always amazed that people can get so hung up on events that happen in elementary schools. This piece feels like it could be posted to /r/iamverysmart.

mlajtos · on April 24, 2023

> I'm always amazed that people can get so hung up on events that happen in elementary schools. This piece feels like it could be posted to /r/iamverysmart.

Ages were 10, 15 & ~23 respectively. Yes, the post suffers from selection bias. I will write more about my failures in the future.

And as per your request, I went and try post it to /r/iamverysmart, but only screenshot submissions are allowed and screenshot of the whole post wouldn't be very effective.

munificent · on April 24, 2023

There's a certain set of people who by virtue of genetics upbringing or both, feel very strongly:

1. You should do your best to please authority figures.

2. You should do your best to be correct.

It's really distressing the first time they run into a clear example where those two deep-seated needs are in conflict. For a lot of gifted teacher's pet kind of kids, this is their Kobayashi Maru moment.

I still remember mine too. :-/

mpeg · on April 24, 2023

I love these kind of logic puzzles, if you're up for losing an evening the clips game [0] is one of my long time favourite ones. It starts easy but puzzles 6 and 7 are tricky to do within the assertion limit

[0]: https://md5crypt.github.io/clipsgame/

contravariant · on April 24, 2023

That was quite an enjoyable way to lose an evening, thanks.

mdonahoe · on April 24, 2023

I can't even beat the first level.

hgsgm · on April 27, 2023

Is because the the puzzle has several facts that are relevant in the real world, but irrelevant in the puzzle, and the solution to the puzzle is a fact that is false in the real world. It's a terrible tutorial example.

mlajtos · on April 24, 2023

I also struggle. :D I think `(eats tiger plant)` would be fine, but it isn't.

mdonahoe · on April 24, 2023

Ah, my smarter friend figured it out.

Hint: what is eatable?

mlajtos · on April 24, 2023

I passed with `(is tiger eatable)`. I don't get it. :D

olafalo · on April 24, 2023

I'm not very good at this but I'll see if I can explain my thought process:

(is tiger fed) requires (eats tiger ?food)

which means there must exist ?what such that (is ?what eatable) and (is ?what ?food)

But in the fact table, nothing is actually eatable! We could have our tiger eat plants by asserting (eats tiger plants) and (is grass eatable), but that's two assertions. Tigers eat animals, and tigers ARE animals, so since we need to make something eatable we might as well do it on the only animals we have: (is tiger eatable)

I got level 2 as well but only after checking for hints in the source. I have been thoroughly nerd-sniped.

mlajtos · on April 25, 2023

Okay, so it's not a typical tutorial. You have to golf and hack it. I'll give it a go again. Thank you for your explanation. :)

hgsgm · on April 27, 2023

yesh, treat it more like a "reverse engineer a broken system" challenge than a "real world problem solving" challenge.

Jiro · on April 24, 2023

>If a card shows an even number on one side, then its opposite side is blue.

As the problem fails to specify that the card has a number on one side and a color on the other, you need to check three cards.

mpeg · on April 24, 2023

No, you don't. You still only have to check those 2 cards. If you're asking about card 3 then that's the common mistake – it doesn't matter whether that card has a number or an emoji on the other side, because the only logical test here is that if it has an even number on one side, the other side must be blue.

It doesn't matter what blue cards have on the other side, we don't need to test them.

edit: sorry, I just realised you're referring to card 1 and not 3, yeah I suppose you'd need to check it too unless there's a rule about always needing to match a color to a number.

quirino · on April 24, 2023

What they mean is that the card that has a 3 on the front might have an even number on the back.

mpeg · on April 24, 2023

yep I just realised and edited my comment, I imagine the full question has a clearer description that OP cut down for the blog post.

mlajtos · on April 24, 2023

To be fair and square, I added an explicit statement about consistency of the cards, so it is clear.

firstlink · on April 24, 2023

> The teacher was perplexed of why I said such wrong answer to such simple question.

This is extremely toxic on many levels. A teacher who doesn't know their subject matter or at least can't be bothered to refresh before a lesson. A teacher who submits to social pressure from their students when determining the answer. A teacher who shames students for being wrong. A teacher who shames students for being right!

The students who were wrong initially will, thanks to this teacher, have completely failed to learn the lesson the problem was teaching. They will remember only that they got the problem right (the teacher said so) and only due to some technicality in the answer book were they subjected to an embarrassing "lesson" from the smart kid.

Yes, this is making a mountain of a molehill, but molehills like this can have mountainous impacts on society.

tshaddox · on April 24, 2023

I mean, there’s a lot going wrong with this “lesson.” The exercise is pretty clearly designed to trick people, which is very counterproductive if the goal is to teach people about multiplication or fractions or whatever. It’s really not that damning that a teacher would be fooled into skipping the actual work, since that deceit is clearly the entire point of choosing those numbers.

It’s a bit like if a proofreading exercise in an English class deliberately did that trick where you put two “the”s next to each other but across a line break. That will often fool a skilled English writer, but that says nothing about the writer’s skill.

rcoveson · on April 24, 2023

While the teacher in the story obviously bungled it, I disagree with your interpretation of the lesson. The point is not to "teach people about multiplication or fractions or whatever". The point is to teach them to recognize dimensional parameters in a natural scenario. This can be counter-intuitive; we have a bias towards linear relationships. Somebody twice as tall is twice as heavy, thinks the child.

Tricks ("surprises" is a less negative descriptor) are important because they wake your brain up. Most people don't like learning arbitrary things for their own sake, but they don't like being fooled either.

Describe the base rate fallacy to a room of college students and it might stick for the rest of the semester, but give them this problem[0] and tell them how many licensed doctors it tricked (after you let them answer incorrectly, of course) and they might remember it for a while longer.

0. http://pi.math.cornell.edu/~mec/2008-2009/TianyiZheng/Bayes....

tshaddox · on April 24, 2023

If the point of the lesson truly was to identify that the exercise was chosen deliberately to be deceptive, then sure. But I doubt it. Your example is very different, because there’s no obvious answer that one might spit out instantly by doing naive pattern matching.

rcoveson · on April 24, 2023

The fact that there is an obvious but incorrect answer that you might spit out if you were just pattern matching is the whole point. It tricks you into being wrong, which "tricks" you into wanting to know why, where before you might not have cared about the answer or the underlying reasoning at all. You think the question just accidentally picked tricky numbers? Or you think the reason that it picked them was just malice towards children?

tshaddox · on April 24, 2023

> You think the question just accidentally picked tricky numbers? Or you think the reason that it picked them was just malice towards children?

I think it's probably just abysmal pedagogy. This should be pretty clear given that almost the entire class and even the teacher fell for the trick, which shows that it's a terrible way to teach something and even a terrible way to test if they know how to do the thing you're wanting them to learn. The teacher and many of the children probably did in fact know how how to do the calculation and solve the problem, and would have done so if the numbers weren't very deliberately chosen to invoke a hasty pattern-matching reaction.

rcoveson · on April 24, 2023

So which is the correct way to choose parameters for a problem?

1. Deliberately choose numbers that have no notable relationships with eachother that could mislead students.

2. Randomize parameters in an effort to introduce only as many trick questions as would occur randomly in nature (so, given small-valued problems like this one, maybe one question in a hundred will have tricky parameters by chance).

3. Deliberately choose parameters that are somewhat or even maximally tricky to ensure student's undertanding is robust in the face of misdirection.

4. A deliberate mix of 1 and 3, with easier questions to start and trickier questions always included but never too often and never too early.

You seem to be advocating for #1, or possibly #2 (though I get the impression that if #2 were employed and trick parameters arose by chance on a quiz you would complain, bringing us back to #1). #4 seems like the best strategy to me, with an extra helping of tricks for my kids, please. It's not like you learn less when you get tricked. Quite the opposite!

tshaddox · on April 24, 2023

It's not nearly as complicated as all that. Just don't deliberately try to trigger a misleading pattern match. Almost any conceivable way you would naturally come up with a math exercise like this would not end up fooling people into giving an immediate incorrect answer. You have to go out of your way (or get very unlucky if you're choosing random parameters) to construct an exercise that will fool a large portion of people.

rcoveson · on April 24, 2023

> You have to go out of your way (or get very unlucky if you're choosing random parameters) to construct an exercise that will fool a large portion of people.

That's not remotely true. You're generally picking small integers, here, and frequently dealing with problems that have very few parameters.

"Species of birds in the genus Madeupicus all have the same proportions. Adult birds of one species in the genus have a wingspan of one meter and weigh one kilogram. How much do birds of a species in the genus with a two meter wingspan weigh?"

This is a very similar problem to the one in the original example. It has the same trick, which is that the integers on either side of the equation are the same in the first example, but they will not be the same in the second equation due to dimensional scaling.

Students do hundreds of math problems per week. Randomly-selected small integers will collide frequently.

And that's all ignoring the benefits of surprising people, which you still tacitly deny. What are you optimizing your questions for, "fairness"? "Predictability"? That sounds like a great way to ensure as many people as possible who didn't actually understand the subject matter will pass your tests.

deepsun · on April 24, 2023

> That is literaly measuring car's speed at every instant!

Well, there's no way to measure to measure speed instantly, it's always over some period even if small (take some time interval and see how much distance have been covered). For example, at time zero you cannot know what speed the car has.

Similar like when I was working with audio, producing frequency diagram -- the frequency cannot be gathered from a single timestamp on the waveform, you always need some interval, like 20ms to get an FT.

mlajtos · on April 25, 2023

Speedometer should be enough. :)

dns_snek · on April 24, 2023

The cards example doesn't work in Firefox, it reveals the opposite side of the card :/

mlajtos · on April 24, 2023

Oh, this browser again.

It seems that Firefox might have a problem with `mix-blend-mode: multiply;` and 3D CSS transforms.

Cards are now correctly behaving in Firefox, but across the board they are visually less appealing.

Thank you for bringing this up.

tzot · on April 24, 2023

Ah, so it's my Firefox :)

Kim_Bruning · on April 24, 2023

Concur. I was very confused until I decided to reopen in chromium. %-)

moffkalast · on April 24, 2023

Absolutely hilarious how people claim that Gecko isn't half broken anymore.

tzot · on April 24, 2023

1. I had a similar issue in the fifth grade of elementary school (in my country, elementary school is 6 years with an extra year at age about 5 before pre-elementary school, then three years of middle high, then three years of optional high school before university, but almost everyone does the full 12/13 years.), when we discussed Fahrenheit (foreign) vs Celsius (domestic). The teacher said that 50°C is 106°F because 212°F is 100°C and 212/2 = 106. At the time I said “this doesn't feel right, 106 is not in the middle between 32 and 212” but I couldn't phrase it mathematically yet, and the teacher didn't take my objection in a good manner. Afterwards I learnt that he understood his error but never said a word to me about it.

2. I don't see the colours on the “back sides” of the cards. I just see the cards flipped/rotated 180° around the Y-axis (probably through CSS), so all numbers are visible, just in cards 3 & 4 the numbers are mirrored.

mlajtos · on April 24, 2023

2. Firefox user, right? It should be fixed now. Thank you for reporting this issue.

1. You reminded me of another nice example of wrong shortcuts. Average speed of car from city A to city B is 120km/h. Average speed back (B -> A) was 80km/h. What was the average speed of the car on the whole path (A -> B -> A)? It's not 100km/h.

youainti · on April 25, 2023

To be pedantic, 100km/h is the arithmetic mean. Of course this is the wrong tool to use (which is your point) but 100km/h is the correct answer if by mean you intend to say arithmetic mean.

firstlink · on April 24, 2023

> It's not 100km/h.

Of course it is... if you're averaging over distance. There's no reason that's less valid than whichever average you have in mind.

joebiden2 · on April 24, 2023

How do you arrive at 100km/h if you're averaging over distance?

I don't doubt you're right, but I don't see it. My calculation is as follows: suppose the distance between both cities is 120km. The trip A -> B thus takes exactly one hour. The trip B -> A takes 1.5 hours (120km / 80km/h) = 1.5h. The total trip therefore takes 2.5h, the total distance is 240km, which averages at 96km/h (240/(1 + 1.5)).

A simple (V1 + V2)/2 is surely possible, but I don't see the value of that average, I can't think of a situation where it would be useful.

mlajtos · on April 25, 2023

You are right. Harmonic mean is the right tool.

tkot · on April 24, 2023

I think you are thinking about a case where someone goes 120km/h for 1 hour and then 80km/h for 1 hour instead of going over a fixed distance at 120km/h and then going over the same distance at 80km/h.

mlajtos · on April 25, 2023

No. As I stated, the distance is the same.

tkot · on May 1, 2023

I was responding to the person mentioning 100km/h as the correct answer (firstlink).

eviks · on April 24, 2023

How did you learn it about the teacher?

tzot · on April 27, 2023

From his nephew, one year older than me, in the sixth grade with whom we played football (soccer). He said that he heard his uncle talking to his mom.

simse · on April 24, 2023

Sorry, unrelated to the writing (which I enjoyed), but related to the horizontal scrollbar! If you change w-screen to w-full on both the navbar and footer, it goes away. 100vw -> 100%. Just a tiny enhancement to an otherwise good looking website.

mlajtos · on April 24, 2023

> unrelated to the writing (which I enjoyed)

Aw, thank you. :)

> related to the horizontal scrollbar

Thank you very much, I was not aware of this issue. Changed both to `w-full`, hope scrollbar goes away. :)

BTW As a pizza aficionado I deeply enjoyed "wood fired pizza". :D

stared · on April 24, 2023

Why would anyone answer 12?

Unless... most human thinking is this lower-end language model, which looks at word patterns without giving much thought to them.

"12" is not intuition. It is repetition.

btbuildem · on April 24, 2023

Why? Because if three hens laid three eggs in three days, then it means one hen lays one egg per day - so twelve hens, given twelve days, will lay twelve eggs. That's the immediate reasoning, right?

It's very easy to fall into that trap, and miss the fact that two of the variables were changed.

hgsgm · on April 24, 2023

It's misapplied intuition for double ratios.

If a triangle has sides 3:3:3 in feet, in inches it has sides 12:12:12.

roenxi · on April 24, 2023

Interesting. So we can tell from the last example that humans are perfectly good at doing this reasoning in theory and that therefore the trouble is actually in interpreting what the question is asking.

Put another way, if the question could be beamed into someone's head they'd get it right. The flaw is in the part of the mind that loads the words into their brain and not the reasoning after that.

travisjungroth · on April 24, 2023

I don’t think this is the most accurate way to describe it. It’s considering the mind as separate from the brain. There’s also no true version of the problem to “beam” into the brain. That would just be a third version.

Our brains aren’t purely general purpose problem solvers. There are parts of the brain that are specifically related to facial recognition versus all images. We’re better at recognizing differences in faces versus buildings.

There seems to be a similar “hardware acceleration” for problems about people following rules (can’t remember the source for this). People generally do better on problems if you can frame them as who is breaking a rule, who is lying, etc. I think the last example would still be easier to solve on average even if it was a rule unfamiliar to people.

mlajtos · on April 24, 2023

I think it shows that we suck at generalization. Even though the task is the same, we are not able to see the same pattern in an arbitrary framing of the same problem. Beaming directly might help.

denton-scratch · on April 24, 2023

I didn't get why he deprecated distance/time as a direct way of determining average velocity.

Is this pedantry about "average" vs. "mean"? As I understand it, "average" is a vague word, one of whose meanings is "mean". And I imagine most of the other meanings would have distance/time as their value as well.

pajko · on April 24, 2023

Another tricky question is the one about 3 doors, two with goats behind them, and the 3rd one is hiding the prize. The host opens door #1 with a goat behind it, the doors #2 and #3 are still closed. Now what's the chance of the door #2 hiding the prize behind it? The intuitive answer is 50%, but it's wrong.

anamexis · on April 24, 2023

You're not fully describing the Monty Hall problem - as you've described it, the answer is indeed 50%.

In the Monty Hall problem:

1. You pick one of three doors.

2. The host opens one of the two doors you didn't pick to reveal a goat

3. You are given the opportunity to keep your original choice, or pick the other unopened door

In this situation, there is a 1/3 chance that the prize is behind the original door, and a 2/3 chance it's behind the remaining unopened door.

mlajtos · on April 24, 2023

You described a slightly wrong Monty Hall problem [0]. When I explain solution to this problem, I always just add more doors. 1 million doors, 1 car, 999999 goats. You pick one door, and I show you 999998 goats. Only two doors are left. What is the chance you picked car at the first guess?

0: https://en.wikipedia.org/wiki/Monty_Hall_problem

xiande04 · on April 24, 2023

That's not how it goes. And, yes, the answer is 50% in this case.

The problem relies on first choosing a door (1). The host then opens a door that you did NOT choose (3) and asks if you want to switch your choice to door #2. In this case, the probability that the goat is behind door 2 is not 50%.

https://en.wikipedia.org/wiki/Monty_Hall_problem

vezuchyy · on April 24, 2023

That's still 50% though, now if you have decided to open door #2 before host opened #1 you may increase your chances by changing to door #3.

The chances of a price being behind either door are exactly 50/50, it's the chance of you made the right choice in the beginning is only 33%. That is called Monty Hall PARADOX for a reason.

contravariant · on April 24, 2023

The answer is 0 because the prize is behind the third door #3, but I feel you're either missing a couple of details or you're trying to mess with people who know Monty-Hall.