Hacker News new | past | comments | ask | show | jobs | submit login
Cognitive Toolkit beta for deep learning advances (microsoft.com)
181 points by mrry on Oct 25, 2016 | hide | past | favorite | 17 comments



CNTK from Microsoft Research has been rebranded as Microsoft Cognitive Toolkit.

Here's the github repo: https://github.com/Microsoft/CNTK

CNTK homepage (http://www.cntk.ai/) now redirects to https://www.microsoft.com/en-us/research/product/cognitive-t...

> CNTK (http://www.cntk.ai/), the Computational Network Toolkit by Microsoft Research, is a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph. In this directed graph, leaf nodes represent input values or network parameters, while other nodes represent matrix operations upon their inputs. CNTK allows to easily realize and combine popular model types such as feed-forward DNNs, convolutional nets (CNNs), and recurrent networks (RNNs/LSTMs). It implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation and parallelization across multiple GPUs and servers. CNTK has been available under an open-source license since April 2015. It is our hope that the community will take advantage of CNTK to share ideas more quickly through the exchange of open source working code.


> CNTK allows to OT pet peeve: the tech culture's distaste for pronouns has made this all too common. It doesn't even have to be a pronoun: "users", "people", "clients", etc all work -- but without it, the author is referring to a thing is never specified. That offends me as a technical person far more than pronouns ever will.

In particular, project descriptions and readmes are rife with "X allows to".


Finally Python bindings! I wanted to use this because Tensorflow is impossible on Windows, but the lack of programming language bindings made it a non-starter. Glad this is finally here.


Besides the rebranding, the Python bindings seem relatively (2 months) new. Though the docs seem to imply it is pretty high level compared to other frameworks https://www.cntk.ai/pythondocs/

One interesting note is that there seem to be plans to create a Keras backend that lets you run Keras models on CNTK: https://github.com/Microsoft/CNTK/issues/797


CNTK contributor here - Keras indeed is pretty high on our list of things to cover soon. But then, all our code is out there on GitHub and we welcome PRs :-)


More information about Cognitive Toolkit is available here https://www.microsoft.com/en-us/research/product/cognitive-t...


How does it compare to TensorFlow?


Here's an article comparing TensorFlow to an earlier version, when it was still an MSR project called CNTK: https://esciencegroup.com/2016/02/08/tensorflow-meets-micros.... The author concluded that they both seemed very useful; he dinged CNTK for its lack of Python bindings at the time but that seems fixed now.


CNTK has significantly higher performance with one or more machines; great multi-gpu scalability. Can train harder on bigger datasets given your resources.


There's a saying: lies, damn lies and benchmarks. In this case AFAIK mi one has replicated these claims.

MS hasn't got it in the standard CNN inference benchmark yet though: https://github.com/soumith/convnet-benchmarks. This is the benchmark that showed how slow TensorFlow was on non-Google libraries until Google fixed it.


Can someone clarify this? In my head "one or more machines" means "always". Does CNTK generally have higher perf even on a single machine? Or is ajwald trying to say it is better at scaling to multiple machines.


CNTK user & contributor here. CNTK overall has very low framework overhead and has tensors with dynamic axes as first-level citizens. This means that sequences can be expressed without needing to do padding, sorting of the input data, or any other workarounds, and can be packed automatically by the toolkit in an optimal way. In particular, while it is laying out the rectangular structure it uses to traverse multiple RNNs of a minibatch in parallel, it fits shorter sequences into the holes and can reset the RNN state for these sequences while it is traversing this structure. This makes CNTK especially suitable for expressing RNN models (for CNNs many of the calls are just forwarded to CuDNN, so the difference might be much lower).

As for distribution, a) it has an extremely simple way to run data parallelism (for CNTK 1 it was just using MPI and starting the worker with a few extra options. I think CNTK 2 will add this in a week or so to the Python bindings), b) it has 1-bit SGD and more recently BlockMomentum, which are just dead simple methods to use for distributing the gradients, and they just work. All of these are open source (though 1-Bit SGD and BlockMomentum are patented).


An algorithm for updating parameters is patented?


As a windows user it's incredibly difficult to impossible to get tensorflow working with gpu with ipython/jupyter notebook. Hopefully this is easier.


Much faster to scale out for deep learning workloads


How general/efficient are these AD systems ?

New ops in tensorflow seem to be oriented towards forward-mode AD rather than reverse-mode (for which one needs a pull back op on a dual).


The fastest for distributed deep learning workloads... The proven Toolkit for Microsoft production system... Now Python support is native! Cognitive Toolkit rocks!!!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: