Hacker News new | past | comments | ask | show | jobs | submit login
L2 Regularization and Batch Norm (janestreet.com)
159 points by signa11 on July 23, 2019 | hide | past | favorite | 15 comments



Myrtle has a similar series that includes discussion about batch normalisation: https://myrtle.ai/learn/how-to-train-your-resnet/

https://twitter.com/dcpage3/status/1141700299071066112

Disclaimer: I work at Myrtle!


I looked at it, and it is very good: good baselines, explanations, visualizations and going deeper than a typical "it is a black box but if you copy & paste it will work".

I was surprised by the nice network vis (and I did dive into the subject before: https://medium.com/inbrowserai/simple-diagrams-of-convoluted...). The only thing that looks clunky is the text logs for training (a shameless plug: https://github.com/stared/livelossplot).


Jane Street is noted for its use of OCaml, so it's interesting to see that their researchers do indeed use Python (judging from the code in that post, at least).


They've also used OCaml for deep learning: https://blog.janestreet.com/deep-learning-experiments-in-oca...

Although it seems to be an OCaml binding into TF, rather than a native implementation.


There’s a similar tch-rs project wrapping libtorch, and in general, deeclaring neural networks is particularly intuitive in functional languages.


There is a project that aims to bring NN/ML/DL/RL (along with other scientific computations) in the OCaml world - Owl[1]. They also have a list[2] of the potentially interesting ideas for new contributors to take.

[1] http://ocaml.xyz/

[2] http://ocaml.xyz/project/proposal.html


Could it be that Python is being used for rapid prototyping but OCaml for production? I've heard of several trading firms with this approach.


Wow I didn't even know janestreet even posts these kind of articles. i always assumed they were super secretive


This is a popularization of things already published in open papers, so it does not reveal anything specific about their activities. Any place employing deep ML practitioners could have written this.

It could even be a red herring, as the most popular application of batch norm is to Deep CNNs, and those are mostly used on computer vision problems. CV does not seem important for option pricing, which is AFAIK Jane Street's big money maker. Of course I can be very wrong about this. People have tried image data as auxiliary inputs to financial data. Or you can apply Deep CNNs to 1D data like timeseries - see WaveNet applied to timeseries forecasting.


I wasn't familiar with batch normalization before, but I've had to do something similar before in Stan to enforce that some model parameters (not data) were exactly mean of 0 and standard deviation of 1.


David Wu wrote KatoGo which is doing new and exciting stuff in Go Ai. He's writing a ton of code and performing even more experiments.


Have you got a reference for katoGo ?

edit: nevermind https://blog.janestreet.com/accelerating-self-play-learning-...


Excellent article! I'd be really curious to see a treatment of the same topic, but with ADAM.



[flagged]


Wow. You straight up copy-pasted the top reddit comment on this article from 5 months ago [0]. Funny thing is that that the article mentions making corrections due to that comment (also 5 months ago) so your stolen comment isn't even relevant anymore.

Not cool.

[0] https://www.reddit.com/r/MachineLearning/comments/aler62/d_l...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: