Hacker News new | past | comments | ask | show | jobs | submit login

> Probably because that's not what it was designed for.

TF was explicitly designed with distributed training in mind (their initial whitepaper and the DistBelief paper that came before it make this clear) -- I don't know how you came to this conclusion.

Usually when people say TF is slow, it turns out they've introduced a serious bottleneck somewhere.




Both points are true: TF is obviously designed for scalable, distributed training, but it also heavily tied to Google's compute infrastructure (less so all the time, of course, but now it's also being closely tied to Google Cloud). So while I disagree with my colleague* that TF is "slow" or "not designed for distributed training," I support the slightly different (and implicit) argument that there are some settings (often in enterprise, I am learning) where it might not be as good a fit as other frameworks (e.g., DL4J, Caffe, whatever).

* Disclosure: I work with Skymind and contribute to DL4J, and I also use TensorFlow/Theano/keras heavily in my PhD research. I am an equal opportunity framework guy. ;)


To all these Skymind kool-aid drinkers, I won't bother arguing with you. I'll let the tensorflow vs DL4J usage numbers tell the story.

Spoiler: Tensorflow wins.


Spoiler: 95% of them are Udacity students without experience or budgets.


The environment for your large clusters is almost certainly different from everyone using TF outside of Google. I'm speaking about the problems they'll run into.


I'm speaking about the problems they'll run into.

What are these mythical problems you speak of? I'd hear to hear some specifics, because I haven't hit them yet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: