L2 Regularization and Batch Norm

samgd · on July 23, 2019

Myrtle has a similar series that includes discussion about batch normalisation: https://myrtle.ai/learn/how-to-train-your-resnet/

https://twitter.com/dcpage3/status/1141700299071066112

Disclaimer: I work at Myrtle!

stared · on July 23, 2019

I looked at it, and it is very good: good baselines, explanations, visualizations and going deeper than a typical "it is a black box but if you copy & paste it will work".

I was surprised by the nice network vis (and I did dive into the subject before: https://medium.com/inbrowserai/simple-diagrams-of-convoluted...). The only thing that looks clunky is the text logs for training (a shameless plug: https://github.com/stared/livelossplot).

phonebucket · on July 23, 2019

Jane Street is noted for its use of OCaml, so it's interesting to see that their researchers do indeed use Python (judging from the code in that post, at least).

joshvm · on July 23, 2019

They've also used OCaml for deep learning: https://blog.janestreet.com/deep-learning-experiments-in-oca...

Although it seems to be an OCaml binding into TF, rather than a native implementation.

stochastic_monk · on July 25, 2019

There’s a similar tch-rs project wrapping libtorch, and in general, deeclaring neural networks is particularly intuitive in functional languages.

xvilka · on July 23, 2019

There is a project that aims to bring NN/ML/DL/RL (along with other scientific computations) in the OCaml world - Owl[1]. They also have a list[2] of the potentially interesting ideas for new contributors to take.

[1] http://ocaml.xyz/

[2] http://ocaml.xyz/project/proposal.html

Rainymood · on July 23, 2019

Could it be that Python is being used for rapid prototyping but OCaml for production? I've heard of several trading firms with this approach.

ackbar03 · on July 23, 2019

Wow I didn't even know janestreet even posts these kind of articles. i always assumed they were super secretive

ovi256 · on July 23, 2019

This is a popularization of things already published in open papers, so it does not reveal anything specific about their activities. Any place employing deep ML practitioners could have written this.

It could even be a red herring, as the most popular application of batch norm is to Deep CNNs, and those are mostly used on computer vision problems. CV does not seem important for option pricing, which is AFAIK Jane Street's big money maker. Of course I can be very wrong about this. People have tried image data as auxiliary inputs to financial data. Or you can apply Deep CNNs to 1D data like timeseries - see WaveNet applied to timeseries forecasting.

1980phipsi · on July 23, 2019

I wasn't familiar with batch normalization before, but I've had to do something similar before in Stan to enforce that some model parameters (not data) were exactly mean of 0 and standard deviation of 1.

SethTro · on July 23, 2019

David Wu wrote KatoGo which is doing new and exciting stuff in Go Ai. He's writing a ton of code and performing even more experiments.

nestorD · on July 23, 2019

Have you got a reference for katoGo ?

edit: nevermind https://blog.janestreet.com/accelerating-self-play-learning-...

bmh · on July 23, 2019

Excellent article! I'd be really curious to see a treatment of the same topic, but with ADAM.

jing · on July 23, 2019

https://www.fast.ai/2018/07/02/adam-weight-decay/

Vizarddesky · on July 23, 2019

[flagged]

6gvONxR4sf7o · on July 23, 2019

Wow. You straight up copy-pasted the top reddit comment on this article from 5 months ago [0]. Funny thing is that that the article mentions making corrections due to that comment (also 5 months ago) so your stolen comment isn't even relevant anymore.

Not cool.

[0] https://www.reddit.com/r/MachineLearning/comments/aler62/d_l...