Lessons Learned Reproducing a Deep Reinforcement Learning Paper

Radim · on April 9, 2018

"Switching from experimenting a lot and thinking a little to experimenting a little and thinking a lot was a key turnaround in productivity. When debugging with long iteration times, you really need to pour time into the hypothesis-forming step - thinking about what all the possibilities are, how likely they seem on their own, and how likely they seem in light of everything you’ve seen so far."

Not to glorify by-gone days, but isn't there a certain charm to having to wait your turn to plug punch cards into a mainframe?

Having a "single shot", with long iteration cycles, indeed does something strange to the way you approach programming and bugs and program design.

Anecdote: my long iteration cycles were caused by my mom restricting access to my C64. I filled page after page (paper!) with code, anxious for any chance to try it. I like to think I learned something during these "mental dry runs"…

On topic: I wrote a rant about the "Mummy Effect" of trying to reproduce ML papers (which we do regularly): https://rare-technologies.com/mummy-effect-bridging-gap-betw...

pjmorris · on April 10, 2018

In my first programming class (1981) - FORTRAN! - we keypunched cards, handed them to a clerk to run, and waited hours for the results. We learned to be careful and to think things through. But nothing like Knuth...

"When I was at Stanford with the AI project [in the late 1960s] one of the things we used to do every Thanksgiving is have a computer programming contest with people on research projects in the Bay area. The prize I think was a turkey.

[John] McCarthy used to make up the problems. The one year that Knuth entered this, he won both the fastest time getting the program running and he also won the fastest execution of the algorithm. He did it on the worst system with remote batch called the Wilbur system. And he basically beat the shit out of everyone.

And they asked him, "How could you possibly do this?" And he answered, "When I learned to program, you were lucky if you got five minutes with the machine a day. If you wanted to get the program going, it just had to be written right. So people just learned to program like it was carving stone. You sort of have to sidle up to it. That's how I learned to program." [0]

[0] http://www.softpanorama.org/People/Knuth/index.shtml

ssivark · on April 9, 2018

As they say, a few days in the laboratory might save a few hours in the library!

Alternately, a few days of doing might save a few hours of thinking. :-)

hulahoof · on April 10, 2018

The flavour I'd always heard was a day of coding saves an hour of planning

crististm · on April 9, 2018

Hamming of the hamming code fame was saying exactly this in one of his lectures. His computer was slow enough that one iteration in finding optimal trajectory for a missile took several minutes, enough to allow him to get insight into what the 'curious' results meant.

(In that particular story, it meant that for a ballistic missile the optimal trajectory was to shoot it straight up and steer it on the way down instead of steering it on an elliptic path from the get go)

sytelus · on April 9, 2018

Where can we find more about this story?

crististm · on April 10, 2018

Look up "you and your research" on youtube. He mentioned that story in a couple of lectures, one time in the one about simulation:

https://youtu.be/O5Ml5kPouG8?list=PL2FF649D0C4407B30&t=688

I recommend the whole course though.

segmondy · on April 10, 2018

The REPL is often misused and is now a curse.

The idea behind the REPL is for fast iteration in development, but the key thing missing is thinking. Most people write code at the REPL to see what will happen. They don't formulate a hypothesis before hitting the enter key. It's more of a guess driven development methodology, keep trying till it works.

If you want to improve your program experience, slow down on the run/compile cycle. Think it through.

j0e1 · on April 9, 2018

I've been facing similar issues recently and have been slowly drifting towards similar ideas mentioned in the article. Though, coming from a more traditional software dev background, I'm still in the process of internalizing how the popular start-up mantra-'build fast and fail fast' doesn't really work in ML.

Also, logging every experiment is another discipline I'm teaching myself. And I fell in love with org-mode in the process.

deong · on April 9, 2018

When I was doing my PhD, I once resorted to turning on all logging, printing about 60 pages of the resulting output, taping them to the floor of the lab, and crawling around on all fours with a printout of the source code tracing the execution to try and prove to myself that a result I was seeing wasn't due to just a bug in the source code. In an odd way, it's maybe my fondest memory from my dissertation work.

seanmcdirmid · on April 9, 2018

Computers have incredible potential in helping us think about and solve problems. Most programming environments have difficulty in showing up that potential however (they are think to program rather than program to think). Especially in machine learning, it is interesting to wonder what kind of tool on a computer would help us solve problems if it isn’t a programming environment or notebook/playground.

zwieback · on April 9, 2018

I'd say this is generally good advice in science & engineering. My EE and ME coworkers are forced to do this because the each cycle is so slow and expensive.

shmageggy · on April 9, 2018

A lot of these points apply to non-RL and non-Deep-Learning projects as well. I'm currently working on a machine learning project that has nothing to do with RL, neural networks, Tensorflow, or GPUs, but I have encountered many of the same surprises and have converged on many of the same solutions and insights. Here are the ones that I think transfer over to general machine learning projects:

- Implementation time vs debugging time: I've spent vastly more time debugging and rewriting. Furthermore, I was surprised at how much time I spent even setting up infrastructure to allow debugging. Corollary: Don't take shortcuts in initial implementation. Literally every hack I put in in order to get something working quickly has come back to bite me and I've had to rewrite and refactor every one of them.

- Thinking vs doing, and being explicit: There are too many degrees of freedom to just try random shit. One of the biggest tools I found (like the author) was to write everything down. I have four+ notebooks filled with literal prose conversations with myself interspersed with diagrams and drawings. I credit that with getting me past many mental roadblocks.

- Measure and test: Without unit and integration tests, I'd probably go in circles forever. It's too easy to change something to fix a problem and have that break something else. I now have a test for even the most mundane assumptions in my model and my code, after too many times realizing "oh, before I changed X I could assume Y was true but that no longer holds".

I feel like engineering for machine learning requires a different skill-set than traditional programming and even traditional AI programming. Taking a very calculated and deductive approach like the author suggests seems to be crucial.

deong · on April 9, 2018

This is exactly right.

When I was teaching machine learning classes, the biggest hurdle I had to get a lot of students over was their history and expectation that they can implement something, fix the compiler errors, fix the runtime crashes, and then just declare victory. Machine Learning is basically the study of getting a computer do give you an answer you don't know, so you have to be in this constant state of defensiveness, never quite trusting that things are correct.

rossdavidh · on April 9, 2018

100% agree, but a key point is: any domain or environment that requires you to program this way (and it's not just ML, for sure), will take surprisingly long to implement anything. There were real gains in productivity from being able to shift out of the mainframe/punch-card era (think long, then test) to the laptop era (try things as part of thinking, in a quick iterative loop). When we have to give that up, things will take a lot longer than the experience of recent decades would cause you to expect.

eric24234 · on April 10, 2018

This is the same in my programming experience. But the legends like John Carmack has a difference advice and its hard to ignore people them.

m_ke · on April 10, 2018

A good way to get around the long iteration cycles is to develop on subsets of the data or simpler tasks. With computer vision you can either start with something like MNIST (which might not always be ideal because you'll need to adjust your architecture) or take a subset of about a 1000 examples from your real training set and get it to overfit.

Another thing that helps is having a way to queue up multiple experiments. It's not that hard to set up with celery.

Also I'd really recommend starting with the dumbest baseline model possible and getting the evaluation pipeline around it working. You can then focus on iterating on it and measure how much each change moves the needle.

superdimwit · on April 9, 2018

What is the tool you are using to write your 'logs'? (pictured in this image http://amid.fish/images/rl_logs.jpg)

jquast · on April 9, 2018

that looks like jupyter notebook or the floydhub service he talks about mid-article

m_ke · on April 9, 2018

I'm pretty sure that's Bear, a great markdown based notes app.

http://www.bear-writer.com/

fishfish · on April 10, 2018

Yeah, Bear :)

thisisit · on April 9, 2018

>Switching from experimenting a lot and thinking a little to experimenting a little and thinking a lot was a key turnaround in productivity.

I think corollary might be - Know your data and the processing steps from the paper very well.