Bifurcate the Problem Space

teddyh · on Feb 22, 2023

From The Devouring Fungus (Karla Jennings, 1990), chapter 6:

—

[…] Imagine the frustration of the Stanford University instructor with an especially thick repairman who kept showing up to fix the same computer. He’d stay for a few hours, poke around, announce the machine fine, and leave; but he never knew what he was doing, so the instructor kept calling him back because the machine was never fixed.

Finally the instructor got so mad that he said “Let’s do a binary switch. The problem has got to be one of the circuit boards.” The repairman didn’t know what a binary switch was, but he agreed because he had an angry customer.

A binary switch involves taking half the circuit boards from the machine having trouble and switching them with identical boards in another machine. If the problem’s been transferred to the new machine, you know that something’s wrong with the switched boards. Then you reswitch smaller and smaller number of boards until you pinpoint which board is defective.

They switched half the boards and, sure enough, the problem moved to the other machine. The repairman said, “So what do we do now?” So they swapped half of the changed boards. The trouble was that now both machines worked. (Perhaps shaking the boards around had put a loose contact back in place.)

“That’s great!” said the repairman. “I’m going to use this procedure from now on. I never knew you could fix a machine just by swapping boards!”

—

agumonkey · on Feb 22, 2023

I expected the story ending with none of the machine working :)

tayloramurphy · on Feb 22, 2023

I was fortunate enough to learn this strategy outside of a software context first. In grad school I was responsible for maintaining our GC-MS and LC-MS machines (Gas/Liquid Chromatograph / Mass Spectrometer) in addition to using them for my own experiments on a regular basis. They were complex pieces of equipment that had a ton of fiddly components.

The best way to troubleshoot these machines was to split the problem in half again and again until you found the part that needed to be cleaned, repaired, or replaced. The way to do that was to run different known chemicals with their own known signatures through the machine again and again. For example, you could run a single chemical (maybe ethanol - I honestly don't remember which ones at this point) and see if it goes straight through the GC and gives you the peak you expect. If it doesn't you could look at the result and see if the timing was off (indicates something with the GC or gas flow) or if the peak was wonky (indicates something with the MS). And then you just keep going. (Sounds a lot like unit and integration tests right?)

Applying that same strategy works wonders for data pipelines as well. Is it something with the extractor or loader (or god forbid Airflow)? Break it down and go from there.

My only nit on this is that while bifurcate is technically accurate (or is it actually?) it feels unnecessarily complex sounding for folks learning this skill.

ianmcgowan · on Feb 22, 2023

I've always used "divide and conquer" which is a cliche for a reason.

Bifurcate is commonly used by Indian colleagues, and perhaps programmers are familiar with the term, but it's not something used by US business people in my experience.

nonethewiser · on Feb 22, 2023

Also, binary search

ithkuil · on Feb 22, 2023

I more often heard and used "bisect". Is that subtly different or subtly the same?

joatmon-snoo · on Feb 22, 2023

It's the same, e.g. the git bisect command is just a binary search API for a commit history.

bfung · on Feb 22, 2023

To finish the thought explicitly ~

If one has pretty good automated tests, it’s possible to automate and pinpoint which commit has the last working version and which commit had the test failure by using git bisect and using the test results as input.

mattbee · on Feb 22, 2023

I can't find this post from a physicist, but they had their bicycle stolen and the police had given up searching the CCTV log because it was taking them too long, despite the fact that it was pointed at the bicycle.

He told them that even if the CCTV log went back to the earth's formation, they would only need to take about 57 stills to pinpoint the second the bike was stolen (apparently they were not persuaded).

joshjje · on Feb 22, 2023

Am I missing something? How is that?

bar94 · on Feb 22, 2023

2^57 seconds is about 4.5 × 10^9 years, which is the age of the universe. They would only need that many stills because it would be possible to do a binary search on the footage, as each frame either has the bike in it or does not.

Quekid5 · on Feb 22, 2023

Maybe I'm just being dense, but I don't see how that would work. The bike's presence is not a monotonic property -- so you never know which half of the search space you need to descend into.

(In fact, you don't even know if a bike was ever there without already having identified a frame with the bike in it. After all, the physicist could be lying.)

Jtsummers · on Feb 22, 2023

Probably exaggeration for effect, don't take it too literally. But if the bike had been there from the beginning of time, finding the second where it disappeared requires no more than 57 frames to be checked. So if searching billions of years of history only requires checking 57 frames, why is it so hard to check a day or a week or a few hours worth of data? (Of course, if this was before digital video records it could actually be tedious, but hardly impossible, to check even a few days worth of data since that may force a linear scan at least partially.)

Given that they knew when the bike was there (presumably the physicist knew when they locked it up or near enough) they could have found that point and looked forward, and it would have taken far fewer than 57 frames to identify where it disappeared if you're only interested in getting down to the second.

So if they knew the bike was present at 1pm, and it was gone by 3pm (hypothetical since not enough information is given) then they can do a binary search on that 2 hour window, that's only 7200 seconds worth of frames. Start at 2pm, is it present? Flip to 2:30, else 1:30. Repeat. Even a week is only 604k seconds, which would require no more than 20 frames to be checked.

Quekid5 · on Feb 22, 2023

The bike could have disappeared, then re-appeared and disappeared again, tho. So you might be finding the "wrong" thief.

(Btw, I realize it's just an anecdote, I'm just being extremely nitpicky.)

joshjje · on Feb 23, 2023

Ah I believe the confusion was thinking that 57 snapshots over said billion years would be enough to deduce the second. But instead they mean if you used binary search (or similar) across ALL of the images you could find it within 57 or so.

darkerside · on Feb 22, 2023

I consider myself a "good" debugger, and I don't really know why. Reading this makes me think that I probably have several strategies, including this one, that I subconsciously use without having a label for them.

ianmcgowan · on Feb 22, 2023

I really like this book (and have recommended it before on hn) http://debuggingrules.com/?page_id=31 - it helps to have a shared team vocab for some of the concepts.

There's a summary set of slides here that gives a great overview: https://courses.cs.washington.edu/courses/cse474/18wi/pdfs/l...

ChrisMarshallNY · on Feb 22, 2023

This is Troubleshooting 101; "Divide and Conquer."

My original training was as an electronic technician; specifically, an RF tech.

We would debug through a signal path.

The very first thing they taught us to do, was stick a probe in the halfway point, and see if the signal was what it was supposed to be.

HarHarVeryFunny · on Feb 22, 2023

Sure, and of course this is the essence of printf-debugging. You probably have a rough idea where the problem is, so throw in some printf statements to narrow it down, then maybe a 2nd set/etc to further narrow it as needed.

A brute force (non-targetted) binary search may always be an option if you have a problem where you don't even know where to look.

yunruse · on Feb 22, 2023

The principle reminds me of a “bifurcation algorithm” I found to bifurcate and explore parameter space in a sort of novel way. It has its drawbacks! But it was fun to tinker with, too.

Effectively, we define some function to map N -> B, where N is the natural integers and B is in [0, 1]. (We can then apply some arbitrary function on B for our actual parameter value). We find the initial values as 0, 1, 0.5, 0.25, 0.75; that is: the extrema, all halves (1), all missed quarters (2), missed eighths (4), missed sixteenths (8), and so on. I used some binary representation of N to get B; there are probably other ways to do it.

I found this pretty useful at bifurcating a highly dimensional parameter space – it meant that made sure at all times I could get a decently equal view of all points (with bias towards B=0) in expanding detail. It also meant I didn’t have to think about allocation time, which was useful on a somewhat packed (but use-as-needed) compute cluster.

davrosthedalek · on Feb 22, 2023

Sounds a lot like quasi-random low-discrepancy sequences like Sobol or Halton. Very effective tool to accelerate Monte Carlo integrations, which boils down to the effectiveness of exploring the phase space (but you have to make sure that sequences in different dimensions are independent!).

nh23423fefe · on Feb 22, 2023

Asking questions which yield more bits of information is good practice. Bifurcating a space creates higher entropy signals when you finally measure.

A greyscale camera can tell you white or black and you could differentiate day from night. But a color camera can tell you red white blue black and so now you can derive sunrise sunny cloudy night. The set of observables is what you partition into signals.

Enhancing your observable space lets you derive stronger signals to act on.

flir · on Feb 22, 2023

I do this, because I like dumb solutions. This, along with challenging assumptions, is basically how I debug. I love git bisect, a lot of people don't even know it's there.

I'll note that, when pair programming, the people who sit and reason about the code often outperform me, so the way I look at it I'm trading time against complexity.

HarHarVeryFunny · on Feb 22, 2023

I work for a telecom, and I've heard techs suggest something similar to find/fix problems in the field. If you can't find the source of an intermittent problem, then change/replace something (anything!) to narrow scope for next time it happens.

Y_Y · on Feb 22, 2023

Is this not bisection rather than bifurcation?

metadat · on Feb 22, 2023

Bisection means dividing a curve into two equal parts.

Bifurcation means divide, split, or fork something into two.

Y_Y · on Feb 22, 2023

But of course bisection doesn't just apply to curves, and bifurcation only refers to certain kinds of division. You wouldn't bifurcate a cake or a number, for example. I feel like it refers to a continuous process of splitting e.g. in time like a species that evolves into two species or in space like a road that forks into two roads.

dandelany · on Feb 22, 2023

I think the key difference is that bisect refers to slicing something into equally sized parts. This is sometimes the case (see eg. git bisect) but bifurcating the problem space would be the more general term.

davrosthedalek · on Feb 22, 2023

While bisect is indeed slicing into two (mostly) equal parts, bifurcation is creating two branches, and in my mind, with the intent of following up on both (at least initially). So what's done here is bifurcation with pruning, and I think bisect would be a more appropriate term -- certainly the attempt should be made to divide the problem space into equal parts.

jacknews · on Feb 22, 2023

aka divide and conquer, split the possibilities, isolate the problem, etc, etc.

'bifurcate the problem space' sounds so Silicon Valley.

otikik · on Feb 22, 2023

The fact that the author is "commenting things out" makes me think that they don't have access to a debugger on their environment. When I find myself on that particular situation, I tend to:

* Add lots of logs (e.g. "entered function X with parameters a, b, c", "exited function X, return value W"). * "Make things explode in a useful way". This is, cause an error on purpose, which halts the program. This is useful for debugging server-like environments where many different things might happen at the same time, so the logs become insufficient. It is way slower than having a debugger doing step-by-step scrutiny, but it can be done.

smilliken · on Feb 22, 2023

"Commenting out" is often meant informally, not to be taken literally.

Your debugger may not work with a multithreaded program. Or it might cause the bug to disappear because it changes scheduling/execution order. Printf is more reliable in this context.

Or they may be debugging hardware, which the debugger can't step into. Or debugging object code without symbols.

Or they may be debugging a distributed system with remote code execution and asynchrony.

Or they may be debugging a language that doesn't have an execution stack.

Or they may be debugging an issue that only happens in production or customer machines that they can't access directly.

Or they may be debugging an issue that only happens very rarely, longer than the patience of a programmer to wait for a breakpoint.

Debuggers are one tool we have available, but hardly the only tool, and definitely not a complete tool.

otikik · on Feb 22, 2023

Perhaps I didn’t explain myself correctly. My point is that when using a debugger is an option, they are preferable to print-based debugging. I am aware that they are not always an option. I don’t think my comment implied that.